Abstract
Several state-of-the-art Generic Visual Categorization (GVC) systems are built around a vocabulary of visual terms and characterize images with one histogram of visual word counts. We propose a novel and practical approach to GVC based on a universal vocabulary, which describes the content of all the considered classes of images, and class vocabularies obtained through the adaptation of the universal vocabulary using class-specific data. An image is characterized by a set of histograms – one per class – where each histogram describes whether the image content is best modeled by the universal vocabulary or the corresponding class vocabulary. It is shown experimentally on three very different databases that this novel representation outperforms those approaches which characterize an image with a single histogram.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Amir, A., Argillander, J., Berg, M., Chang, S.-F., Franz, M., Hsu, W., Iyengar, G., Kender, J., Kennedy, L., Lin, C.-Y., Naphade, M., Natsev, A., Smith, J., Tesic, J., Wu, G., Yang, R., Zhang, D.: IBM research TRECVID-2004 video retrieval system. In: Proc. of TREC Video Retrieval Evaluation (2004)
Bilmes, J.: A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical report, Department of Electrical Engineering and Computer Science, UC Berkeley (1998)
Chen, Y., Wang, J.Z.: Image categorization by learning and reasoning with regions. Journal of Machine Mearning Research 5, 913–939 (2004)
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Proc. of ECCV Workshop on Statistical Learning for Computer Vision (2004)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39(1), 1–38 (1977)
Deselaers, T., Keysers, D., Ney, H.: Classification error rate for quantitative evaluation of content-based image retrieval systems. In: Proc. of ICPR (2004)
Farquhar, J., Szedmak, S., Meng, H., Shawe-Taylor, J.: Improving “bag-of-keypoints” image categorisation. Technical report, University of Southampton (2005)
Gauvain, J.-L., Lee, C.-H.: Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans. on Speech and Audio Processing 2(2), 291–298 (1994)
Hsu, W.H., Chang, S.-F.: Visual cue cluster construction via information bottleneck principle and kernel density estimation. In: Proc. of CIVR (2005)
Leung, T., Malik, J.: Recognizing surfaces using three-dimensional textons. In: Proc. of ICCV (1999)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. Journal of Computer Vision 60(2), 91–110 (2004)
Reynolds, D., Quatieri, T., Dunn, R.: Speaker verification using adapted Gaussian mixture models. Digital Signal Processing 10, 19–41 (2000)
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Sivic, J.S., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: Proc. of ICCV, vol. 2, pp. 1470–1477 (2003)
Varma, M., Zisserman, A.: A statistical approach to texture classification from single images. Int. Journal of Computer Vision 62(1–2), 61–81 (2005)
Winn, K., Criminisi, A., Minka, T.: Object categorization by learned visual dictionary. In: Proc. of ICCV (2005)
Zhang, J., Marszalek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: an in-depth study. INRIA, Research report 5737 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Perronnin, F., Dance, C., Csurka, G., Bressan, M. (2006). Adapted Vocabularies for Generic Visual Categorization. In: Leonardis, A., Bischof, H., Pinz, A. (eds) Computer Vision – ECCV 2006. ECCV 2006. Lecture Notes in Computer Science, vol 3954. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11744085_36
Download citation
DOI: https://doi.org/10.1007/11744085_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33838-3
Online ISBN: 978-3-540-33839-0
eBook Packages: Computer ScienceComputer Science (R0)