Simple Statistics for Complex Feature Spaces

Nagy, George; Zhang, Xiaoli

doi:10.1007/978-1-84628-172-3_9

George Nagy³ &
Xiaoli Zhang⁴

Part of the book series: Advanced Information and Knowledge Processing ((AI&KP))

1130 Accesses
4 Citations

Summary

We study the constraints that govern the distribution of symbolic patterns (letters, numerals, and other glyphs used for communication) and natural patterns in high-dimensional feature spaces, with a view to gaining insight into the complexity of classification tasks. Pattern vectors from several data sets of printed and hand-printed digits are standardized to identity covariance matrix variables via principal component analysis, shifting to zero mean and scaling. The probability density of the radius of the set of patterns (their distance from the origin) is computed and shown to predict accurately the observed average radius for a wide range of features and dimensionality. We predict further that the class centroids of symbolic patterns will form the vertices of a regular simplex (i.e., a d-dimensional tetrahedron). The observed pairwise distances of the 45 class centroids in ten-class problems are shown to be almost equal to the value predicted from the average radius of the class centroids. The class-conditional distributions of the patterns are compared using two measures of divergence. The difference between the distributions of the same class with different feature sets is found to be larger than the difference between the distributions of different classes with the same feature set. This suggests that the correlation among features of patterns of one class can predict the correlation among features of patterns in another class. The amount of within-source consistency in a data set is quantified using an entropy measure that takes into account small-sample effects. The statistical dependence between the features of same-source patterns of different classes is measured by mutual information applied to the discrete distributions resulting from quantization of the style assignments. If these observations are supported by further studies of symbolic and natural patterns with diverse data sets, they may eventually lead to improved classification methods for same-source ensembles of symbolic patterns.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Pattern Learning and Recognition on Statistical Manifolds: An Information-Geometric Review

Improving model choice in classification: an approach based on clustering of covariance matrices

Article Open access 19 March 2024

Feature space partition: a local–global approach for classification

Article 05 August 2022

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

T.K. Ho, M. Basu. Complexity measures of supervised classification problems. IEEE Transactions on Pattern Analysis & Machine Intelligence, 24(3), 289–300, 2002.
Article Google Scholar
C.L. Liu, H. Sako, H. Fujisawa. Performance evaluation of pattern classifiers for handwritten character recognition. International Journal on Document Analysis and Recognition, 4(3), 191–204, 2002.
Article Google Scholar
R. McLean. Typography. London: Thames & Hudson, 1980.
Google Scholar
R. Plamondon. A kinematic theory of rapid human movements, part I. Biological Cybernetics, 72(4), 295–307, 1995.
Google Scholar
R. Plamondon. A kinematic theory of rapid human movements, part II. Biological Cybernetics, 72(4), 309–320, 1995.
Google Scholar
R. Plamondon. A kinematic theory of rapid human movements, part III. Biological Cybernetics, 78, 133–145, 1999.
Article Google Scholar
J. Greenberg. Universals of Human Language, vol. 2. Stanford, CA: Stanford University Press, 1978.
Google Scholar
R. Raimi. The first digit problem. American Mathematical Monthly, 83, 521–538, 1976.
Article MathSciNet Google Scholar
M.J. Nigrini. “I’ve got your number”-CPA use of Benford’s law of mathematics in discovering fraud: how a mathematical phenomenon can help CPAs uncover fraud and other irregularities. Journal of Accountancy, 79–80, May 1999.
Google Scholar
R.O. Duda, P.E. Hart. Pattern Classification and Scene Analysis. New York: JohnWiley and Sons, 1973.
MATH Google Scholar
G. Nagy, S. Veeramachaneni. A ptolemaic model for OCR. Procs. ICDAR-03, Edinburgh, August 2003, pp. 1060–1064.
Google Scholar
K. Fukunaga. Introduction to Statistical Pattern Recognition. New York: Academic Press, 1972.
Google Scholar
G.J. McLachlan. Discriminant Analysis and Statistical Pattern Recognition. NewYork: Wiley Series in Probability and Mathematical Statistics, 1992.
Book Google Scholar
G. Nagy, G.L. Shelton. Self-corrective character recognition system. IEEE Transactions on Information Theory, 12(2), 215–222, 1966.
Article Google Scholar
S. Veeramachaneni, G. Nagy. Adaptive classifiers for multi-source OCR. International Journal on Document Analysis and Recognition, 6(3), 154–166, 2004.
Article Google Scholar
G. Nagy. Classifiers that improve with use. In Procs. Conference on Pattern Recognition and Multimedia, Tokyo, Februrary 2004, IEICE, pp. 79–86.
Google Scholar
P. Sarkar, G. Nagy. Style consistent classification of isogenous patterns. IEEE Transactions on Pattern Analysis & Machine Intelligence, 27(1), 88–98, 2005.
Article Google Scholar
S. Veeramachaneni, G. Nagy. Style context with second-order statistics. IEEE Transactions on Pattern Analysis & Machine Intelligence, 27(1), 14–22, 2005.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Electrical, Computer, & Systems Engineering, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
George Nagy
DocLab, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
Xiaoli Zhang

Authors

George Nagy
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoli Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Electrical Engineering Department, City College, City University of New York, USA
Mitra Basu PhD
Bell Laboratories, Lucent Technologies, New Jersey, USA
Tin Kam Ho BBA, MS, PhD

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nagy, G., Zhang, X. (2006). Simple Statistics for Complex Feature Spaces. In: Basu, M., Ho, T.K. (eds) Data Complexity in Pattern Recognition. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-84628-172-3_9

Download citation

DOI: https://doi.org/10.1007/978-1-84628-172-3_9
Publisher Name: Springer, London
Print ISBN: 978-1-84628-171-6
Online ISBN: 978-1-84628-172-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Simple Statistics for Complex Feature Spaces

Summary

Chapter PDF

Similar content being viewed by others

Pattern Learning and Recognition on Statistical Manifolds: An Information-Geometric Review

Improving model choice in classification: an approach based on clustering of covariance matrices

Feature space partition: a local–global approach for classification

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Simple Statistics for Complex Feature Spaces

Summary

Chapter PDF

Similar content being viewed by others

Pattern Learning and Recognition on Statistical Manifolds: An Information-Geometric Review

Improving model choice in classification: an approach based on clustering of covariance matrices

Feature space partition: a local–global approach for classification

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation