Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers

Gupta, Abhinav; Davis, Larry S.

doi:10.1007/978-3-540-88682-2_3

Abhinav Gupta⁴ &
Larry S. Davis⁴

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5302))

Included in the following conference series:

European Conference on Computer Vision

9407 Accesses
55 Citations

Abstract

Learning visual classifiers for object recognition from weakly labeled data requires determining correspondence between image regions and semantic object classes. Most approaches use co-occurrence of “nouns” and image features over large datasets to determine the correspondence, but many correspondence ambiguities remain. We further constrain the correspondence problem by exploiting additional language constructs to improve the learning process from weakly labeled data. We consider both “prepositions” and “comparative adjectives” which are used to express relationships between objects. If the models of such relationships can be determined, they help resolve correspondence ambiguities. However, learning models of these relationships requires solving the correspondence problem. We simultaneously learn the visual features defining “nouns” and the differential visual features defining such “binary-relationships” using an EM-based approach.

The authors would like to thank Kobus Barnard for providing the Corel-5k dataset. The authors would also like to acknowledge VACE for supporting the research.

Download to read the full chapter text

Chapter PDF

Unsupervised Visual Representation Learning by Graph-Based Consistent Constraints

Discriminative Semi-supervised Learning Based on Visual Concept-Like Features

Improving Visual Relationship Detection Using Semantic Modeling of Scene Descriptions

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Armitage, L., Enser, P.: Analysis of user need in image archives. Journal of Information Science (1997)
Google Scholar
Barnard, K., Duygulu, P., Freitas, N., Forsyth, D., Blei, D., Jordan, M.I.: Matching words and pictures. Journal of Machine Learning Research, 1107–1135 (2003)
Google Scholar
Carneiro, G., Chan, A.B., Moreno, P., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. IEEE PAMI (2007)
Google Scholar
Carbonetto, P., Freitas, N., Barnard, K.: A statistical model for general contextual object recognition. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 350–362. Springer, Heidelberg (2004)
Chapter Google Scholar
Duygulu, P., Barnard, K., Freitas, N., Forsyth, D.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)
Chapter Google Scholar
Barnard, K., Forsyth, D.: Learning the semantics of words and pictures. In: ICCV, pp. 408–415 (2001)
Google Scholar
Andrews, S., Tsochantaridis, I., Hoffman, T.: Support vector machines for multiple-instance learning. In: NIPS (2002)
Google Scholar
Li, J., Wang, J.: Automatic linguistic indexing of pictures by statistical modeling approach. IEEE PAMI (2003)
Google Scholar
Maron, O., Ratan, A.: Multiple-instance learning for natural scene classification. ICML (1998)
Google Scholar
Lavrenko, V., Manmatha, R., Jeon, J.: A model for learning the semantics of pictures. In: NIPS (2003)
Google Scholar
Feng, S., Manmatha, R., Lavrenko, V.: Multiple bernoulli relevance models for image and video annotation. In: CVPR (2004)
Google Scholar
Mori, Y., Takahashi, H., Oka, R.: Image to word transformation based on dividing and vector quantizing images with words. MISRM (1999)
Google Scholar
Brown, P., Pietra, S., Pietra, V., Mercer, R.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics (1993)
Google Scholar
Srikanth, M., Varner, J., Bowden, M., Moldovan, D.: Exploiting ontologies for automatic image annotation. SIGIR (2005)
Google Scholar
Jin, R., Chai, J., Si, L.: Effective automatic image annotation via a coherent language model and active learning. Mutimedia (2004)
Google Scholar
Brill, E.: A simple rule-based part of speech tagger. ACL (1992)
Google Scholar
Brill, E.: Transformation-based error-driven learning and natural language processing. Computational Linguistics (1995)
Google Scholar
Ferrari, V., Zisserman, A.: Learning visual attributes. In: NIPS (2007)
Google Scholar
Barnard, K., Yanai, K., Johnson, M., Gabbur, P.: Cross modal disambiguation. Toward Category-Level Object Recognition (2006)
Google Scholar
Barnard, K., Fan, Q.: Reducing correspondence ambiguity in loosely labeled training data. In: CVPR (2007)
Google Scholar
Barnard, K., Johnson, M.: Word sense disambigutaion with pictures. AI (2005)
Google Scholar
Barnard, K., Fan, Q., Swaminathan, R., Hoogs, A., Collins, R., Rondot, P., Kaufold, J.: Evaluation of localized semantics: data, methodology and experiments. Univ. of Arizona, TR-2005 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Maryland, College Park, USA
Abhinav Gupta & Larry S. Davis

Authors

Abhinav Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Larry S. Davis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, University of Illinois at Urbana Champaign, 3310 Siebel Hall, Urbana, IL 61801, USA
David Forsyth
Department of Computing, Oxford Brookes University, OX33 1HX, Wheatley, Oxford, UK
Philip Torr
Department of Engineering Science, University of Oxford, Parks Road, OX1 3PJ, Oxford, UK
Andrew Zisserman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gupta, A., Davis, L.S. (2008). Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers. In: Forsyth, D., Torr, P., Zisserman, A. (eds) Computer Vision – ECCV 2008. ECCV 2008. Lecture Notes in Computer Science, vol 5302. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88682-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-540-88682-2_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88681-5
Online ISBN: 978-3-540-88682-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers

Abstract

Chapter PDF

Similar content being viewed by others

Unsupervised Visual Representation Learning by Graph-Based Consistent Constraints

Discriminative Semi-supervised Learning Based on Visual Concept-Like Features

Improving Visual Relationship Detection Using Semantic Modeling of Scene Descriptions

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers

Abstract

Chapter PDF

Similar content being viewed by others

Unsupervised Visual Representation Learning by Graph-Based Consistent Constraints

Discriminative Semi-supervised Learning Based on Visual Concept-Like Features

Improving Visual Relationship Detection Using Semantic Modeling of Scene Descriptions

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation