Summary
We study the constraints that govern the distribution of symbolic patterns (letters, numerals, and other glyphs used for communication) and natural patterns in high-dimensional feature spaces, with a view to gaining insight into the complexity of classification tasks. Pattern vectors from several data sets of printed and hand-printed digits are standardized to identity covariance matrix variables via principal component analysis, shifting to zero mean and scaling. The probability density of the radius of the set of patterns (their distance from the origin) is computed and shown to predict accurately the observed average radius for a wide range of features and dimensionality. We predict further that the class centroids of symbolic patterns will form the vertices of a regular simplex (i.e., a d-dimensional tetrahedron). The observed pairwise distances of the 45 class centroids in ten-class problems are shown to be almost equal to the value predicted from the average radius of the class centroids. The class-conditional distributions of the patterns are compared using two measures of divergence. The difference between the distributions of the same class with different feature sets is found to be larger than the difference between the distributions of different classes with the same feature set. This suggests that the correlation among features of patterns of one class can predict the correlation among features of patterns in another class. The amount of within-source consistency in a data set is quantified using an entropy measure that takes into account small-sample effects. The statistical dependence between the features of same-source patterns of different classes is measured by mutual information applied to the discrete distributions resulting from quantization of the style assignments. If these observations are supported by further studies of symbolic and natural patterns with diverse data sets, they may eventually lead to improved classification methods for same-source ensembles of symbolic patterns.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
T.K. Ho, M. Basu. Complexity measures of supervised classification problems. IEEE Transactions on Pattern Analysis & Machine Intelligence, 24(3), 289–300, 2002.
C.L. Liu, H. Sako, H. Fujisawa. Performance evaluation of pattern classifiers for handwritten character recognition. International Journal on Document Analysis and Recognition, 4(3), 191–204, 2002.
R. McLean. Typography. London: Thames & Hudson, 1980.
R. Plamondon. A kinematic theory of rapid human movements, part I. Biological Cybernetics, 72(4), 295–307, 1995.
R. Plamondon. A kinematic theory of rapid human movements, part II. Biological Cybernetics, 72(4), 309–320, 1995.
R. Plamondon. A kinematic theory of rapid human movements, part III. Biological Cybernetics, 78, 133–145, 1999.
J. Greenberg. Universals of Human Language, vol. 2. Stanford, CA: Stanford University Press, 1978.
R. Raimi. The first digit problem. American Mathematical Monthly, 83, 521–538, 1976.
M.J. Nigrini. “I’ve got your number”-CPA use of Benford’s law of mathematics in discovering fraud: how a mathematical phenomenon can help CPAs uncover fraud and other irregularities. Journal of Accountancy, 79–80, May 1999.
R.O. Duda, P.E. Hart. Pattern Classification and Scene Analysis. New York: JohnWiley and Sons, 1973.
G. Nagy, S. Veeramachaneni. A ptolemaic model for OCR. Procs. ICDAR-03, Edinburgh, August 2003, pp. 1060–1064.
K. Fukunaga. Introduction to Statistical Pattern Recognition. New York: Academic Press, 1972.
G.J. McLachlan. Discriminant Analysis and Statistical Pattern Recognition. NewYork: Wiley Series in Probability and Mathematical Statistics, 1992.
G. Nagy, G.L. Shelton. Self-corrective character recognition system. IEEE Transactions on Information Theory, 12(2), 215–222, 1966.
S. Veeramachaneni, G. Nagy. Adaptive classifiers for multi-source OCR. International Journal on Document Analysis and Recognition, 6(3), 154–166, 2004.
G. Nagy. Classifiers that improve with use. In Procs. Conference on Pattern Recognition and Multimedia, Tokyo, Februrary 2004, IEICE, pp. 79–86.
P. Sarkar, G. Nagy. Style consistent classification of isogenous patterns. IEEE Transactions on Pattern Analysis & Machine Intelligence, 27(1), 88–98, 2005.
S. Veeramachaneni, G. Nagy. Style context with second-order statistics. IEEE Transactions on Pattern Analysis & Machine Intelligence, 27(1), 14–22, 2005.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer Verlag London Limited
About this chapter
Cite this chapter
Nagy, G., Zhang, X. (2006). Simple Statistics for Complex Feature Spaces. In: Basu, M., Ho, T.K. (eds) Data Complexity in Pattern Recognition. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-84628-172-3_9
Download citation
DOI: https://doi.org/10.1007/978-1-84628-172-3_9
Publisher Name: Springer, London
Print ISBN: 978-1-84628-171-6
Online ISBN: 978-1-84628-172-3
eBook Packages: Computer ScienceComputer Science (R0)