Abstract
This paper addresses the well-established problem of unsupervised object discovery with a novel method inspired by weakly-supervised approaches. In particular, the ability of an object patch to predict the rest of the object (its context) is used as supervisory signal to help discover visually consistent object clusters. The main contributions of this work are: 1) framing unsupervised clustering as a leave-one-out context prediction task; 2) evaluating the quality of context prediction by statistical hypothesis testing between thing and stuff appearance models; and 3) an iterative region prediction and context alignment approach that gradually discovers a visual object cluster together with a segmentation mask and fine-grained correspondences. The proposed method outperforms previous unsupervised as well as weakly-supervised object discovery approaches, and is shown to provide correspondences detailed enough to transfer keypoint annotations.
Chapter PDF
Similar content being viewed by others
References
Hinton, G.E., Dayan, P., Frey, B.J., Neal, R.M.: The “wake-sleep” algorithm for unsupervised neural networks. IEEE Proceedings (1995)
Olshausen, B.A., et al.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature (1996)
Le, Q.V.: Building high-level features using large scale unsupervised learning. In: ICASSP (2013)
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: CVPR (2003)
Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering objects and their location in images. In: ICCV (2005)
Russell, B.C., Freeman, W.T., Efros, A.A., Sivic, J., Zisserman, A.: Using multiple segmentations to discover objects and their extent in image collections. In: CVPR (2006)
Lee, Y.J., Grauman, K.: Foreground focus: Unsupervised learning from partially matching images. IJCV (2009)
Payet, N., Todorovic, S.: From a set of shapes to object discovery. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 57–70. Springer, Heidelberg (2010)
Kim, G., Faloutsos, C., Hebert, M.: Unsupervised modeling of object categories using link analysis techniques. In: CVPR (2008)
Grauman, K., Darrell, T.: Unsupervised learning of categories from sets of partially matching image features. In: CVPR (2006)
Faktor, A., Irani, M.: “Clustering by composition” – unsupervised discovery of image categories. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VII. LNCS, vol. 7578, pp. 474–487. Springer, Heidelberg (2012)
Rubinstein, M., Joulin, A., Kopf, J., Liu, C.: Unsupervised joint object discovery and segmentation in internet images. In: CVPR (2013)
Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: CVPR (2011)
Doersch, C., Gupta, A., Efros, A.A.: Mid-level visual element discovery as discriminative mode seeking. In: NIPS (2013)
Singh, S., Gupta, A., Efros, A.A.: Unsupervised discovery of mid-level discriminative patches. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 73–86. Springer, Heidelberg (2012)
Doersch, C., Singh, S., Gupta, A., Sivic, J., Efros, A.A.: What makes Paris look like Paris? In: SIGGRAPH (2012)
Endres, I., Shih, K., Jiaa, J., Hoiem, D.: Learning collections of part models for object recognition. In: CVPR (2013)
Jain, A., Gupta, A., Rodriguez, M., Davis, L.: Representing videos using mid-level discriminative patches. In: CVPR (2013)
Juneja, M., Vedaldi, A., Jawahar, C.V., Zisserman, A.: Blocks that shout: Distinctive parts for scene classification. In: CVPR (2013)
Li, Q., Wu, J., Tu, Z.: Harvesting mid-level visual concepts from large-scale internet images. In: CVPR (2013)
Sun, J., Ponce, J.: Learning discriminative part detectors for image classification and cosegmentation. In: ICCV (2013)
Wang, X., Wang, B., Bai, X., Liu, W., Tu, Z.: Max-margin multiple-instance dictionary learning. In: ICML (2013)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional neural networks. ArXiv preprint ArXiv:1311.2901 (2013)
Oliva, A., Torralba, A.: The role of context in object recognition. Trends in Cognitive Sciences (2007)
Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. In: CVPR (2000)
Adelson, E.H.: On seeing stuff: The perception of materials by humans and machines. In: Photonics West 2001-Electronic Imaging, International Society for Optics and Photonics (2001)
Efros, A.A., Leung, T.K.: Texture synthesis by non-parametric sampling. In: ICCV (1999)
Munroe, R.: xkcd, a webcomic of romance, sarcasm, math and language. Creative Commons Attribution-Noncommercial (2014)
Hariharan, B., Malik, J., Ramanan, D.: Discriminative decorrelation for clustering and classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. ECCV, vol. 7575, pp. 459–472. Springer, Heidelberg (2012)
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. CVIU (2007)
Wang, G., Zhang, Y., Fei-Fei, L.: Using dependent regions for object categorization in a generative framework. In: CVPR (2006)
Sudderth, E.B., Torralba, A., Freeman, W.T., Willsky, A.S.: Learning hierarchical models of scenes, objects, and parts. In: ICCV (2005)
Vondrick, C., Khosla, A., Malisiewicz, T., Torralba, A.: HOG-gles: Visualizing object detection features. In: ICCV (2013)
Song, H.O., Girshick, R., Jegelka, S., Mairal, J., Harchaoui, Z., Darrell, T.: On learning to localize objects with minimal supervision. In: ICML (2014)
Hejrati, M., Ramanan, D.: Analyzing 3d objects in cluttered images. In: NIPS (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Doersch, C., Gupta, A., Efros, A.A. (2014). Context as Supervisory Signal: Discovering Objects with Predictable Context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8691. Springer, Cham. https://doi.org/10.1007/978-3-319-10578-9_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-10578-9_24
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10577-2
Online ISBN: 978-3-319-10578-9
eBook Packages: Computer ScienceComputer Science (R0)