Context as Supervisory Signal: Discovering Objects with Predictable Context

Doersch, Carl; Gupta, Abhinav; Efros, Alexei A.

doi:10.1007/978-3-319-10578-9_24

Carl Doersch¹⁹,
Abhinav Gupta¹⁹ &
Alexei A. Efros²⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8691))

Included in the following conference series:

European Conference on Computer Vision

18k Accesses
16 Citations

Abstract

This paper addresses the well-established problem of unsupervised object discovery with a novel method inspired by weakly-supervised approaches. In particular, the ability of an object patch to predict the rest of the object (its context) is used as supervisory signal to help discover visually consistent object clusters. The main contributions of this work are: 1) framing unsupervised clustering as a leave-one-out context prediction task; 2) evaluating the quality of context prediction by statistical hypothesis testing between thing and stuff appearance models; and 3) an iterative region prediction and context alignment approach that gradually discovers a visual object cluster together with a segmentation mask and fine-grained correspondences. The proposed method outperforms previous unsupervised as well as weakly-supervised object discovery approaches, and is shown to provide correspondences detailed enough to transfer keypoint annotations.

Download to read the full chapter text

Chapter PDF

Unsupervised Semantic Discovery Through Visual Patterns Detection

Shape-Based Object Discovery in Images

Toward Unsupervised, Multi-object Discovery in Large-Scale Image Collections

Keywords

References

Hinton, G.E., Dayan, P., Frey, B.J., Neal, R.M.: The “wake-sleep” algorithm for unsupervised neural networks. IEEE Proceedings (1995)
Google Scholar
Olshausen, B.A., et al.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature (1996)
Google Scholar
Le, Q.V.: Building high-level features using large scale unsupervised learning. In: ICASSP (2013)
Google Scholar
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: CVPR (2003)
Google Scholar
Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering objects and their location in images. In: ICCV (2005)
Google Scholar
Russell, B.C., Freeman, W.T., Efros, A.A., Sivic, J., Zisserman, A.: Using multiple segmentations to discover objects and their extent in image collections. In: CVPR (2006)
Google Scholar
Lee, Y.J., Grauman, K.: Foreground focus: Unsupervised learning from partially matching images. IJCV (2009)
Google Scholar
Payet, N., Todorovic, S.: From a set of shapes to object discovery. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 57–70. Springer, Heidelberg (2010)
Chapter Google Scholar
Kim, G., Faloutsos, C., Hebert, M.: Unsupervised modeling of object categories using link analysis techniques. In: CVPR (2008)
Google Scholar
Grauman, K., Darrell, T.: Unsupervised learning of categories from sets of partially matching image features. In: CVPR (2006)
Google Scholar
Faktor, A., Irani, M.: “Clustering by composition” – unsupervised discovery of image categories. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VII. LNCS, vol. 7578, pp. 474–487. Springer, Heidelberg (2012)
Chapter Google Scholar
Rubinstein, M., Joulin, A., Kopf, J., Liu, C.: Unsupervised joint object discovery and segmentation in internet images. In: CVPR (2013)
Google Scholar
Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: CVPR (2011)
Google Scholar
Doersch, C., Gupta, A., Efros, A.A.: Mid-level visual element discovery as discriminative mode seeking. In: NIPS (2013)
Google Scholar
Singh, S., Gupta, A., Efros, A.A.: Unsupervised discovery of mid-level discriminative patches. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 73–86. Springer, Heidelberg (2012)
Chapter Google Scholar
Doersch, C., Singh, S., Gupta, A., Sivic, J., Efros, A.A.: What makes Paris look like Paris? In: SIGGRAPH (2012)
Google Scholar
Endres, I., Shih, K., Jiaa, J., Hoiem, D.: Learning collections of part models for object recognition. In: CVPR (2013)
Google Scholar
Jain, A., Gupta, A., Rodriguez, M., Davis, L.: Representing videos using mid-level discriminative patches. In: CVPR (2013)
Google Scholar
Juneja, M., Vedaldi, A., Jawahar, C.V., Zisserman, A.: Blocks that shout: Distinctive parts for scene classification. In: CVPR (2013)
Google Scholar
Li, Q., Wu, J., Tu, Z.: Harvesting mid-level visual concepts from large-scale internet images. In: CVPR (2013)
Google Scholar
Sun, J., Ponce, J.: Learning discriminative part detectors for image classification and cosegmentation. In: ICCV (2013)
Google Scholar
Wang, X., Wang, B., Bai, X., Liu, W., Tu, Z.: Max-margin multiple-instance dictionary learning. In: ICML (2013)
Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional neural networks. ArXiv preprint ArXiv:1311.2901 (2013)
Google Scholar
Oliva, A., Torralba, A.: The role of context in object recognition. Trends in Cognitive Sciences (2007)
Google Scholar
Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. In: CVPR (2000)
Google Scholar
Adelson, E.H.: On seeing stuff: The perception of materials by humans and machines. In: Photonics West 2001-Electronic Imaging, International Society for Optics and Photonics (2001)
Google Scholar
Efros, A.A., Leung, T.K.: Texture synthesis by non-parametric sampling. In: ICCV (1999)
Google Scholar
Munroe, R.: xkcd, a webcomic of romance, sarcasm, math and language. Creative Commons Attribution-Noncommercial (2014)
Google Scholar
Hariharan, B., Malik, J., Ramanan, D.: Discriminative decorrelation for clustering and classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. ECCV, vol. 7575, pp. 459–472. Springer, Heidelberg (2012)
Chapter Google Scholar
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. CVIU (2007)
Google Scholar
Wang, G., Zhang, Y., Fei-Fei, L.: Using dependent regions for object categorization in a generative framework. In: CVPR (2006)
Google Scholar
Sudderth, E.B., Torralba, A., Freeman, W.T., Willsky, A.S.: Learning hierarchical models of scenes, objects, and parts. In: ICCV (2005)
Google Scholar
Vondrick, C., Khosla, A., Malisiewicz, T., Torralba, A.: HOG-gles: Visualizing object detection features. In: ICCV (2013)
Google Scholar
Song, H.O., Girshick, R., Jegelka, S., Mairal, J., Harchaoui, Z., Darrell, T.: On learning to localize objects with minimal supervision. In: ICML (2014)
Google Scholar
Hejrati, M., Ramanan, D.: Analyzing 3d objects in cluttered images. In: NIPS (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Carnegie Mellon University, USA
Carl Doersch & Abhinav Gupta
UC Berkeley, USA
Alexei A. Efros

Authors

Carl Doersch
View author publications
You can also search for this author in PubMed Google Scholar
Abhinav Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Alexei A. Efros
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Toront, 6 King’s College Road, M5H 3S5, Toronto, ON, Canada
David Fleet
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic
Tomas Pajdla
Max-Planck-Institut für Informatik, Campus E1 4, 66123, Saarbrücken, Germany
Bernt Schiele
ESAT - PSI, iMinds, KU Leuven, Kasteelpark Arenberg 10, Bus 2441, 3001, Leuven, Belgium
Tinne Tuytelaars

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Doersch, C., Gupta, A., Efros, A.A. (2014). Context as Supervisory Signal: Discovering Objects with Predictable Context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8691. Springer, Cham. https://doi.org/10.1007/978-3-319-10578-9_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-10578-9_24
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10577-2
Online ISBN: 978-3-319-10578-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Context as Supervisory Signal: Discovering Objects with Predictable Context

Abstract

Chapter PDF

Similar content being viewed by others

Unsupervised Semantic Discovery Through Visual Patterns Detection

Shape-Based Object Discovery in Images

Toward Unsupervised, Multi-object Discovery in Large-Scale Image Collections

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Context as Supervisory Signal: Discovering Objects with Predictable Context

Abstract

Chapter PDF

Similar content being viewed by others

Unsupervised Semantic Discovery Through Visual Patterns Detection

Shape-Based Object Discovery in Images

Toward Unsupervised, Multi-object Discovery in Large-Scale Image Collections

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation