Discriminative Models for Multi-Class Object Layout

Desai, Chaitanya; Ramanan, Deva; Fowlkes, Charless C.

doi:10.1007/s11263-011-0439-x

Discriminative Models for Multi-Class Object Layout

Published: 02 April 2011

Volume 95, pages 1–12, (2011)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

International Journal of Computer Vision Aims and scope Submit manuscript

Discriminative Models for Multi-Class Object Layout

Download PDF

Chaitanya Desai¹,
Deva Ramanan¹ &
Charless C. Fowlkes¹

1524 Accesses
155 Citations
Explore all metrics

Abstract

Many state-of-the-art approaches for object recognition reduce the problem to a 0-1 classification task. This allows one to leverage sophisticated machine learning techniques for training classifiers from labeled examples. However, these models are typically trained independently for each class using positive and negative examples cropped from images. At test-time, various post-processing heuristics such as non-maxima suppression (NMS) are required to reconcile multiple detections within and between different classes for each image. Though crucial to good performance on benchmarks, this post-processing is usually defined heuristically.

We introduce a unified model for multi-class object recognition that casts the problem as a structured prediction task. Rather than predicting a binary label for each image window independently, our model simultaneously predicts a structured labeling of the entire image (Fig. 1). Our model learns statistics that capture the spatial arrangements of various object classes in real images, both in terms of which arrangements to suppress through NMS and which arrangements to favor through spatial co-occurrence statistics.

We formulate parameter estimation in our model as a max-margin learning problem. Given training images with ground-truth object locations, we show how to formulate learning as a convex optimization problem. We employ the cutting plane algorithm of Joachims et al. (Mach. Learn. 2009) to efficiently learn a model from thousands of training images. We show state-of-the-art results on the PASCAL VOC benchmark that indicate the benefits of learning a global model encapsulating the spatial layout of multiple object classes (a preliminary version of this work appeared in ICCV 2009, Desai et al., IEEE international conference on computer vision, 2009).

Article PDF

Scene Parsing with Object Instance Inference Using Regions and Per-exemplar Detectors

Article 28 November 2014

Learning Dictionary of Discriminative Part Detectors for Image Categorization and Cosegmentation

Article 21 March 2016

Where Next in Object Recognition and how much Supervision Do We Need?

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Anguelov, D., Taskar, B., Chatalbashev, V., Koller, D., Gupta, D., Heitz, G., & Ng, A. (2005). Discriminative learning of Markov random fields for segmentation of 3d scan data. In CVPR, II (pp. 169–176).
Google Scholar
Baur, R., Efros, A. A., & Hebert, M. (2008). Statistics of 3d object locations in images (Tech. Rep. CMU-RI-TR-08-43). Robotics Institute, Pittsburgh, PA.
Blaschko, M. B., & Lampert, C. H. (2008). Learning to localize objects with structured output regression. In ECCV (pp. 2–15). Berlin: Springer.
Google Scholar
Choi, M., Lim, J., Torralba, A., & Willsky, A. (2010). Exploiting hierarchical context on a large database of object categories. In IEEE conference on computer vision and pattern recognition, CVPR
Google Scholar
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR I (pp. 886–893).
Google Scholar
Desai, C., Ramanan, D., & Fowlkes, C. (2009). Discriminative models for multi-class object layout. In IEEE international conference on computer vision.
Google Scholar
Desai, C., Ramanan, D., & Fowlkes, C. (2010). Discriminative models for static human-object interactions. In Workshop on structured prediction in computer vision, CVPR.
Google Scholar
Divvala, S., Hoiem, D., Hays, J., & Efros, A. (2009). An empirical study of context in object detection. In CVPR.
Google Scholar
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2007). The PASCAL visual object classes challenge 2007 (VOC2007) results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop.
Felzenszwalb, P. (2008). http://people.cs.uchicago.edu/pff/latent.
Felzenszwalb, P., McAllester, D., & Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. In CVPR.
Google Scholar
Finley, T., & Joachims, T. (2008). Training structural svms when exact inference is intractable. In Proceedings of the 25th international conference on machine learning (pp. 304–311). New York: ACM.
Chapter Google Scholar
Franc, V. (2006). http://cmp.felk.cvut.cz/xfrancv/libqp/html.
Galleguillos, C., Rabinovich, A., & Belongie, S. (2008). Object categorization using co-occurrence, location and appearance. In CVPR, Anchorage, AK.
Google Scholar
Hall, E. (1966). The hidden dimension. New York: Anchor Books.
Google Scholar
He, X., Zemel, R., & Carreira-Perpinan, M. (2004). Multiscale conditional random fields for image labeling. In CVPR (Vol. 2). Los Alamitos: IEEE Comput. Soc.
Google Scholar
Hoiem, D., Efros, A., & Hebert, M. (2008). Putting objects in perspective. IJCV, 80(1), 3–15.
Article Google Scholar
Joachims, T., Finley, T., & Yu, C. (2009). Cutting plane training of structural SVMs. Machine Learning, 77(1), 27–59.
Article Google Scholar
Kolmogorov, V. (2006). Convergent tree-reweighted message passing for energy minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 1568–1583. http://doi.ieeecomputersociety.org/10.1109/TPAMI.2006.200.
Article Google Scholar
Kumar, S., & Hebert, M. (2005). A hierarchical field framework for unified context-based classification. In Tenth IEEE international conference on computer vision, ICCV, 2005 (Vol. 2).
Google Scholar
Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In Workshop on statistical learning in computer vision, ECCV (pp. 17–32).
Google Scholar
Liu, Y., Lin, W., & Hays, J. (2004). Near-regular texture analysis and manipulation. ACM Transactions on Graphics, 23(3), 368–376.
Article Google Scholar
Meltzer, T. (2006). http://www.cs.huji.ac.il/talyam/inference.html.
MSR (2006). http://research.microsoft.com/en-us/downloads/dad6c31e-2c04-471f-b724-ded18bf70fe3/.
Murphy, K., Torralba, A., & Freeman, W. (2003). Using the forest to see the trees: a graphical model relating features, objects and scenes. NIPS 16.
Nemhauser, G., Wolsey, L., & Fisher, M. (1978). An analysis of approximations for maximizing submodular set functions. Mathematical Programming, 14(1), 265–294.
Article MATH MathSciNet Google Scholar
Park, D., Ramanan, D., & Fowlkes, C. (2010). Multiresolution models for object detection. In ECCV.
Google Scholar
Rother, C., Kolmogorov, V., Lempitsky, V., & Szummer, M. (2007). Optimizing binary mrfs via extended roof duality. In CVPR.
Google Scholar
Rowley, H. A., Baluja, S., & Kanade, T. (1996). Neural network-based face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 23–38.
Article Google Scholar
Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2006). Textonboost: joint appearance, shape and context modeling for multi-class object recognition and segmentation. Lecture Notes in Computer Science, 3951, 1.
Article Google Scholar
Sudderth, E., Torralba, A., Freeman, W., & Willsky, A. (2005). Learning hierarchical models of scenes, objects, and parts. In ICCV, II (pp. 1331–1338).
Google Scholar
Teo, C., Smola, A., Vishwanathan, S., & Le, Q. (2007). A scalable modular convex solver for regularized risk minimization. In SIGKDD. New York: ACM.
Google Scholar
Torralba, A., Murphy, K., & Freeman, W. (2004). Contextual models for object detection using boosted random fields. NIPS.
Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector machine learning for interdependent and structured output spaces. In ICML. New York: ACM.
Google Scholar
Tu, Z. (2008). Auto-context and its application to high-level vision tasks. In CVPR.
Google Scholar
Viola, P. A., & Jones, M. J. (2004). Robust real-time face detection. IJCV, 57(2), 137–154.
Article Google Scholar
Wainwright, M., Jaakkola, T., & Willsky, A. (2002). Map estimation via agreement on (hyper)trees: message-passing and linear programming approaches. IEEE Transactions on Information Theory, 51, 3697–3717.
Article MathSciNet Google Scholar
Yanover, C., & Meltzer, T. Y. W. (2006). Linear programming relaxations and belief propagation—an empirical study. In JMLR (pp. 1887–1907).
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, UC Irvine, Irvine, CA, USA
Chaitanya Desai, Deva Ramanan & Charless C. Fowlkes

Authors

Chaitanya Desai
View author publications
You can also search for this author in PubMed Google Scholar
Deva Ramanan
View author publications
You can also search for this author in PubMed Google Scholar
Charless C. Fowlkes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chaitanya Desai.

Additional information

The Marr Prize is awarded to the best paper(s) at the biannual flagship vision conference, the IEEE International Conference on Computer Vision (ICCV). This paper is an extended and re-reviewed journal version of the 2009 prize-winning conference paper.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Desai, C., Ramanan, D. & Fowlkes, C.C. Discriminative Models for Multi-Class Object Layout. Int J Comput Vis 95, 1–12 (2011). https://doi.org/10.1007/s11263-011-0439-x

Download citation

Received: 02 March 2010
Accepted: 18 March 2011
Published: 02 April 2011
Issue Date: October 2011
DOI: https://doi.org/10.1007/s11263-011-0439-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Discriminative Models for Multi-Class Object Layout

Abstract

Article PDF

Similar content being viewed by others

Scene Parsing with Object Instance Inference Using Regions and Per-exemplar Detectors

Learning Dictionary of Discriminative Part Detectors for Image Categorization and Cosegmentation

Where Next in Object Recognition and how much Supervision Do We Need?

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Discriminative Models for Multi-Class Object Layout

Abstract

Article PDF

Similar content being viewed by others

Scene Parsing with Object Instance Inference Using Regions and Per-exemplar Detectors

Learning Dictionary of Discriminative Part Detectors for Image Categorization and Cosegmentation

Where Next in Object Recognition and how much Supervision Do We Need?

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation