Abstract
We formulate a deformable template model for objects with an efficient mechanism for computation and parameter estimation. The data consists of binary oriented edge features, robust to photometric variation and small local deformations. The template is defined in terms of probability arrays for each edge type. A primary contribution of this paper is the definition of the instantiation of an object in terms of shifts of a moderate number local submodels—parts—which are subsequently recombined using a patchwork operation, to define a coherent statistical model of the data. Object classes are modeled as mixtures of patchwork of parts POP models that are discovered sequentially as more class data is observed. We define the notion of the support associated to an instantiation, and use this to formulate statistical models for multi-object configurations including possible occlusions. All decisions on the labeling of the objects in the image are based on comparing likelihoods. The combination of a deformable model with an efficient estimation procedure yields competitive results in a variety of applications with very small training sets, without need to train decision boundaries—only data from the class being trained is used. Experiments are presented on the MNIST database, reading zipcodes, and face detection.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Allassonnière, S., Amit, Y., and Trouvé, A. 2006. Toward a coherent statistical framework for dense deformable template estimation. Journal of the Royal Stat. Soc., to appear.
Amit, Y. 2002. 2d Object Detection and Recognition: Models, Algorithms and Networks, MIT Press: Cambridge, Mass
Amit, Y. and Geman, D. 1997. Shape quantization and recognition with randomized trees. Neural Computation, 9: 1545–1588.
Amit, Y. and Geman, D. 1999. A computational model for visual selection. Neural Computation, 11: 1691–1715.
Amit, Y., Geman, D., and Fan, X. D. 2004. A coarse-to-fine strategy for multi-class shape detection. IEEE-PAMI, 26: 1606–1621.
Belongie, S., Malik, J., and Puzicha, S. 2002. Shape matching and object recongition using shape context. IEEE PAMI, 24: 509–523.
Bernstein, E. J. and Amit, Y. 2005. Part- based models for object classification and detection, In CVPR 2005 (2).
Borenstein, E. 2006. http://www.dam.brown.edu/people/eranb/’.
Borenstein, E., Sharon, E., and S., U. 2004. Combining bottom up and top down segmentation, In Proceedings CVPRW04, Vol. 4, IEEE.
Burl, M., Weber, M., and Perona, P. 1998. A probabilistic approach to object recognition using local photometry and global geometry, In Proc. of the 5th European Conf. on Computer Vision, ECCV 98, pp. 628–641.
Crandall, D., Felzenszwalb, P., and Huttenlocher, D. 2005. Spatial priors for part-based recognition using statistical models, In Proceedings CVPR 2005 to appear.
Dempster, A. P., Laird, N. M., and Rubin, D. B. 1977. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, 1: 1–22.
Fei-Fei, L., Fergus, R., and Perona, P. 2003. A bayesian approach to unsupervised one-shot learning of object categories, In Proceedings of the International Conference on Computer Vision, Vol. 1.
Geman, S., Potter, D. F., and Chi, Z. 2002. Composition systems. Quarterly of Applied Mathematics, LX: 707–736.
Ha, T. M., Zimmermann, M., and Bunke, H. 1998. Off-line handwritten numeral string recognition by combining segmentation-based and segmentation-free methods. Pattern Recognition, 31: 257–272.
Hastie, T. and Simard, P. Y. 1998. Metrics and models for handwritten character recognition. Statistical Science.
LeCun, Y. 2004. The mnist database. http://yann.lecun.com/exdb/mnist/.
LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11): 2278–2324.
Leibe, B. and Schiele, B. 2003. Interleaved object categorization and segmentation, In BMVC’03.
Leung, T., Burl, M., and Perona, P. 1995. Finding faces in cluttered scenes labelled random graph matching, In Proceedings, 5th Int. Conf. on Comp. Vision, pp. 637–644.
liebe04:_scale_invar_objec_categ_using Liebe, B. and Schiele, B. 2004. Scale invariant object categorization using a scale-adaptative mean-shift search, In DAGM’04 Annual Pattern Recognition Symposium, Vol. 3175, pp. 145–153.
Palumbo, P. and Srihari, S. 1996. Postal address reading in real time. Intr. Jour. of Imaging Science and Technology.
Rowley, H. A., Baluja, S., and Kanade, T. 1998. Neural network-based face detection. IEEE Trans. PAMI, 20: 23–38.
Schneiderman, H. and Kanade, T. 2004. Object detection using the statistics of parts. Inter. Jour. Comp. Vis., 56: 151–177.
Torralba, A., Murphy, K. P., and Freeman, W. T. 2004. Sharing visual features for multiclass and multiview object detection, Technical Report AI-Memo 2004-008, MIT.
Tu, Z. W., Chen, X. R., L., Y. A., and Zhu, S. C. 2004. Image parsing: unifying segmentation, detection and recognition. Int’l J. of Computer Vision, to appear.
Vapnik, V. N. 1995. The Nature of Statistical Learning Theory. Springer Verlag, New York.
Viola, P. and Jones, M. J. 2004. Robust real time face detection. Intl. Jour. Comp. Vis., 57: 137–154.
Wang, S. C. 1998. A statistical model for computer recognition of sequences of handwritten digits, with applications to zip codes, PhD thesis, University of Chicago.
Wiskott, L., Fellous, J.-M., Kruger, N., and von der Marlsburg, C. 1997. Face recognition by elastic bunch graph matching. IEEE Trans. on Patt. Anal. and Mach. Intel., 7: 775–779.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Amit, Y., Trouvé, A. POP: Patchwork of Parts Models for Object Recognition. Int J Comput Vis 75, 267–282 (2007). https://doi.org/10.1007/s11263-006-0033-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-006-0033-9