Abstract
A framework for the regularized and robust estimation of non-uniform dimensionality and density in high dimensional noisy data is introduced in this work. This leads to learning stratifications, that is, mixture of manifolds representing different characteristics and complexities in the data set. The basic idea relies on modeling the high dimensional sample points as a process of translated Poisson mixtures, with regularizing restrictions, leading to a model which includes the presence of noise. The translated Poisson distribution is useful to model a noisy counting process, and it is derived from the noise-induced translation of a regular Poisson distribution. By maximizing the log-likelihood of the process counting the points falling into a local ball, we estimate the local dimension and density. We show that the sequence of all possible local countings in a point cloud formed by samples of a stratification can be modeled by a mixture of different translated Poisson distributions, thus allowing the presence of mixed dimensionality and densities in the same data set. With this statistical model, the parameters which best describe the data, estimated via expectation maximization, divide the points in different classes according to both dimensionality and density, together with an estimation of these quantities for each class. Theoretical asymptotic results for the model are presented as well. The presentation of the theoretical framework is complemented with artificial and real examples showing the importance of regularized stratification learning in high dimensional data analysis in general and computer vision and image analysis in particular.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Ambroise, C., & Govaert, G. (1996). Clustering of spatial data by the EM algorithm. In geoENV I—Geostatistics for environmental applications.
Ambroise, C., & Govaert, G. (1998). Convergence of an EM-type algorithm for spatial clustering. Pattern Recognition Letters, 19(10), 919–927.
Barbara, D., & Chen, P. (2000). Using the fractal dimension to cluster datasets. In Proceedings of the sixth ACM SIGKDD (pp. 260–264).
Belkin, M., & Niyogi, P. (2002). Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in NIPS (Vol. 14). Vancouver, Canada.
Bendich, P., Cohen-Steiner, D., Harer, J., Edelsbrunner, H., & Morozov, D. (2007). Inferring local homology from sampled stratified spaces. In 48th annual IEEE symposium on foundations of computer science (pp. 536–546).
Brand, M. (2002). Charting a manifold. In Advances in NIPS (Vol. 16). Vancouver, Canada.
Cao, W., & Haralick, R. (2006). Nonlinear manifold clustering by dimensionality. In Proceedings of the 18th international conference on pattern recognition (Vol. 1, pp. 920–924).
Coifman, R. R., & Lafon, S. (2006). Diffusion maps. Applied and Computational Harmonic Analysis, 21, 5–30. Special issue on Diffusion Maps and Wavelets.
Costa, J. A., & Hero, A. O. (2004). Geodesic entropic graphs for dimension and entropy estimation in manifold learning. IEEE Transactions on Signal Processing, 52(8), 2210–2221.
Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data. Journal of the Royal Statistical Society Ser. B, 39, 1–38.
Dey, T. K., Giesen, J., Goswami, S., & Zhao, W. (2003). Shape dimension and approximation from samples. Discrete and Computational Geometry, 29, 419–434.
Edelsbrunner, H., Letscher, D., & Zomorodian, A. (2002). Topological persistence and simplification. Discrete Computational Geometry, 28, 511–533.
Gionis, A., Hinneburg, A., Papadimitriu, S., & Tsparas, P. (2005). Dimension induced clustering. In Proceeding of the Eleventh ACM SIGKDD (pp. 51–60).
Goh, A., & Vidal, R. (2007). Segmenting motions of different types by unsupervised manifold clustering. In Proceedings of CVPR.
Haro, G., Randall, G., & Sapiro, G. (2006). Stratification learning: Detecting mixed density and dimensionality in high dimensional point clouds. In Advances in NIPS (Vol. 19). Vancouver, Canada.
Haro, G., Randall, G., & Sapiro, G. (2007). Regularized mixed dimensionality and density learning in computer vision. In Proceedings of 1st workshop on component analysis methods for classification, clustering, modeling and estimation problems in computer vision, in conjunction with CVPR, Minneapolis, June 2007.
Hathaway, R. (1986). Another interpretation of the EM algorithm for mixture distributions. Statistics and Probability Letters, 4(2), 53–56.
Huang, K., Ma, Y., & Vidal, R. (2004). Minimum effective dimension for mixtures of subspaces: A robust GPCA algorithm and its applications. In Proceedings of CVPR (pp. 631–638).
Kanatani, K., & Sugaya, Y. (2003). Multi-stage optimization for multi-body motion segmentation. In Proceedings of the Australia-Japan advanced workshop on computer vision (pp. 25–31), September 2003.
Kegl, B. (2002). Intrinsic dimension estimation using packing numbers. In Advances in NIPS (Vol. 14). Vancouver, Canada.
Kung, S. Y., Mak, M. W., & Lin, S. H. (2004). Biometric authentication: a machine learning approach. New York: Prentice Hall.
Lafon, S., Keller, Y., & Coifman, R. R. (2006). Data fusion and multi-cue data matching by diffusion maps. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11), 1784–1797.
Levina, E., & Bickel, P. J. (2005). Maximum likelihood estimation of intrinsic dimension. In Advances in NIPS (Vol. 17). Vancouver, Canada.
Lu, L., & Vidal, R. (2006). Combined central and subspace clustering for computer vision applications. In Proceedings of the 23rd international conference on machine learning (Vol. 148, pp. 593–600).
Ma, Y., Derksen, H., Hong, W., & Wright, J. (2007). Segmentation of multivariate mixed data via lossy coding and compression. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(9), 1546–1562.
Mordohai, P., & Medioni, G. (2005). Unsupervised dimensionality estimation and manifold learning in high-dimensional spaces by tensor voting. In IJCAI (p. 798).
Polito, M., & Perona, P. (2002). Grouping and dimensionality reduction by locally linear embedding. In Advances in NIPS (Vol. 14). Vancouver, Canada.
Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326.
Snyder, D. L. (1975). Random point processes. New York: Wiley.
Snyder, D. L., & Miller, M. I. (1991). Random point processes in time and space. Berlin: Springer.
Souvenir, R., & Pless, R. (2005). Manifold clustering. In ICCV (pp. 648–653).
Takens, F. (1985). On the numerical determination of the dimension of an attractor. Lecture notes in mathematics: Vol. 1125. Dynamical systems and bifurcations (pp. 99–106).
Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323.
Tishby, N., Pereira, F., & Bialek, W. (1999). The information bottleneck method. In Proceedings of the 37-th annual allerton conference on communication, control and computing (pp. 368–377).
Vidal, R., Ma, Y., & Piazzi, J. (2003). Generalized principal component analysis (GPCA). In Proceedings of CVPR (pp. 621–628).
Vidal, R., Ma, Y., & Sastry, S. (2004). Generalized principal component analysis (GPCA). IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(12).
Yang, A. Y., Rao, S. R., & Ma, Y. (2006). Robust statistical estimation and segmentation of multiple subspaces. In CVPR workshop on 25 years of RANSAC.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Haro, G., Randall, G. & Sapiro, G. Translated Poisson Mixture Model for Stratification Learning. Int J Comput Vis 80, 358–374 (2008). https://doi.org/10.1007/s11263-008-0144-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-008-0144-6