VFM: Visual Feedback Model for Robust Object Recognition

Wang, Chong; Huang, Kai-Qi

doi:10.1007/s11390-015-1526-1

VFM: Visual Feedback Model for Robust Object Recognition

Regular Paper
Published: 13 March 2015

Volume 30, pages 325–339, (2015)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Computer Science and Technology Aims and scope Submit manuscript

VFM: Visual Feedback Model for Robust Object Recognition

Download PDF

Chong Wang¹ &
Kai-Qi Huang¹

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Object recognition, which consists of classification and detection, has two important attributes for robustness: 1) closeness: detection windows should be as close to object locations as possible, and 2) adaptiveness: object matching should be adaptive to object variations within an object class. It is difficult to satisfy both attributes using traditional methods which consider classification and detection separately; thus recent studies propose to combine them based on confidence contextualization and foreground modeling. However, these combinations neglect feature saliency and object structure, and biological evidence suggests that the feature saliency and object structure can be important in guiding the recognition from low level to high level. In fact, object recognition originates in the mechanism of “what” and “where” pathways in human visual systems. More importantly, these pathways have feedback to each other and exchange useful information, which may improve closeness and adaptiveness. Inspired by the visual feedback, we propose a robust object recognition framework by designing a computational visual feedback model (VFM) between classification and detection. In the “what” feedback, the feature saliency from classification is exploited to rectify detection windows for better closeness; while in the “where” feedback, object parts from detection are used to match object structure for better adaptiveness. Experimental results show that the “what” and “where” feedback is effective to improve closeness and adaptiveness for object recognition, and encouraging improvements are obtained on the challenging PASCAL VOC 2007 dataset.

Article PDF

Multi-scale Feature and Spatial Relation Inference for Object Detection

Object Detection with Semi-local Features

Context Refinement for Object Detection

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Everingham M, Van Gool L, Williams C K I, Winn J, Zisserman A. The PASCAL Visual Object Classes (VOC) challenge. International Journal of Computer Vision, 2010, 88(2): 303-338.
Article Google Scholar
Deng J, Dong W, Socher R, Li L J, Li K, Li F F. ImageNET: A large-scale hierarchical image database. In Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, June 2009, pp.248-255.
Csurka G, Dance C R , Fan L, Willamowski J, Bray C. Visual categorization with bags of keypoints. In Proc. European Conference on Computer Vision Workshop, May 2004, pp.145-168.
Yang J, Yu K, Gong Y, Huang T. Linear spatial pyramid matching using sparse coding for image classification. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2009, pp.1794-1801.
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y. Localityconstrained linear coding for image classification. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2010, pp.3360-3367.
Zhou X, Yu K, Zhang T, Huang T. Image classification using super-vector coding of local image descriptors. In Proc. the 11th European Conference on Computer Vision, September 2010, pp.141-154.
Perronnin F, S´anchez J, Mensink T. Improving the fisher kernel for large-scale image classification. In Proc. the 11th European Conference on Computer Vision, September 2010, pp.143-156.
Krizhevsky A, Sutskever I, Hinton G E. ImageNET classification with deep convolutional neural networks. In Proc. the 26th Annual Conf. Neural Information Processing Systems, December 2012, pp.1106-1114.
Chatfield K, Simonyan K, Vedaldi A, Zisserman A. Return of the devil in the details: Delving deep into convolutional nets. arXiv:1405.3531, 2014.
Lin M, Chen Q, Yan S. Network in network. arXiv:1312.4400, 2014.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014.
Zeiler M D, Fergus R. Visualizing and understanding convolutional networks. In Proc. the 13th European Conference on Computer Vision, September 2014, pp.818-833.
Felzenszwalb P F, Girshick R B, McAllester D, Ramanan D. Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9): 1627-1645.
Article Google Scholar
Wang X, Bai X, Ma T, Liu W, Latecki L. Fan shape model for object detection. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2012, pp.151-158.
Zhu L, Chen Y, Yuille A, Freeman W. Latent hierarchical structural learning for object detection. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2010, pp.1062-1069.
Girshick R B, Felzenszwalb P F, McAllester D A. Object detection with grammar models. In Proc. the 25th NIPS, December 2011, pp.442-450.
Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2014, pp.580-587.
Hoffman J, Guadarrama S, Tzeng E, Hu R, Donahue J, Girshick R, Darrell T, Saenko K. LSDA: Large scale detection through adaptation. In Proc. NIPS, December 2014, pp.3536-3544.
Zhang N, Donahue J, Girshick R, Darrell T. Part-based R-CNNs for fine-grained category detection. In Proc. the 13th European Conference on Computer Vision, September 2014, pp.834-849.
Gupta S, Girshick R, Arbeláez P, Malik J. Learning rich features from RGB-D images for object detection and segmentation. In Proc. the 13th European Conference on Computer Vision, September 2014, pp.345-360.
Hariharan B, Arbeláez P, Girshick R, Malik J. Simultaneous detection and segmentation. In Proc. the 13th European Conference on Computer Vision, September 2014, pp.297-312.
Zhang J, Zhao X, Huang Y, Huang K, Tan T. Semantic windows mining in sliding window based object detection. In Proc. the 21st International Conference on Pattern Recognition, November 2012, pp.3264-3267.
Russakovsky O, Lin Y, Yu K, Li F F. Object-centric spatial pooling for image classification. In Proc. the 12th European Conference on Computer Vision, Oct. 2012, pp.1-15.
Lazebnik S, Schmid C, Ponce J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, June 2006, pp.2169-2178.
Chikkerur S, Serre T, Tan C, Poggio T. What and where: A Bayesian inference theory of attention. Vision Research, 2010, 50(22): 2233-2247.
Article Google Scholar
Galleguillos C, Belongie S. Context based object categorization: A critical survey. Computer Vision and Image Understanding, 2010, 114(6): 712-722.
Article Google Scholar
Divvala S K, Hoiem D, Hays J H, Efros A A, Hebert M. An empirical study of context in object detection. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2009, pp.1271-1278.
Harzallah H, Jurie F, Schmid C. Combining efficient object localization and image classification. In Proc. the 12th International Conference on Computer Vision, Sept. 29-Oct. 2, 2009, pp.237-244.
Song Z, Chen Q, Huang Z, Hua Y, Yan S. Contextualizing object detection and classification. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2011, pp.1585-1592.
Chen G, Ding Y, Xiao J, Han T X. Detection evolution with multi-order contextual co-occurrence. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2013, pp.1798-1805.
Zhang Y, Chen T. Weakly supervised object recognition and localization with invariant high order features. In Proc. the British Machine Vision Conference, Aug. 31-Sept. 3, 2010, pp.47:1-47:11.
Chen Q, Song Z, Hua Y, Huang Z, Yan S. Hierarchical matching with side information for image classification. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2013, pp.3426-3433.
Nguyen M H, Torresani L, de la Torre F, Rother C. Weakly supervised discriminative localization and classification: A joint learning process. In Proc. International Conference on Computer Vision, September 2009, pp.1925-1932.
Huang Y, Huang K, Yu Y, Tan T. Salient coding for image classification. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2011, pp.1753-1760.
Rybak I A, Gusakova V I, Golovan A V, Podladchikova L N, Shevtsova N A. A model of attention-guided visual perception and recognition. Vision Research, 1998, 38(15/16): 2387-2400.
Article Google Scholar
Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(11): 1254-1259.
Article Google Scholar
Barenholtz E, Tarr M J. Reconsidering the role of structure in vision. The Psychology of Learning and Motivation, 2006, 47:157-180.
Article Google Scholar
Biederman I. Recognition-by-components: A theory of human image understanding. Psychological Review, 1987, 94(2):115-147.
Article Google Scholar
Huang K,Wang Q,Wu Z. Natural color image enhancement and evaluation algorithm based on human visual system. Computer Vision and Image Understanding, 2006, 103(1): 52-63.
Article Google Scholar
Huang K, Wu Z, Wang Q. Image enhancement based on the statistics of visual representation. Image and Vision Computing, 2005, 23(1): 51-57.
Article Google Scholar
Huang K, Wu Z, Fung G S K, Chan F H Y. Color image denoising with wavelet thresholding based on human visual system model. Signal Processing: Image Communication, 2005, 20(2): 115-127.
Google Scholar
Boureau Y, Ponce J, LeCun Y. A theoretical analysis of feature pooling in visual recognition. In Proc. the 27th International Conference on Machine Learning, June 2010, pp.111-118.
Serre T, Wolf L, Poggio T. Object recognition with features inspired by visual cortex. In Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, June 2005, pp.994-1000.
Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786): 504-507.
Article MATH MathSciNet Google Scholar
LeCun Y, Kavukvuoglu K, Farabet C. Convolutional networks and applications in vision. In Proc. IEEE International Symposium on Circuits and Systems, May 30-June 2, 2010, pp.253-256.
Dalal N, Triggs B. Histograms of oriented gradients for human detection. In Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, June 2005, pp.886-893.
Wohlhart P, Donoser M, Roth P M, Bischof H. Detecting partially occluded objects with an implicit shape model random field. In Proc. the 11th Asian Conference on Computer Vision, November 2012, pp.302-315.
Bogacz R, Usher M, Zhang J, McClelland J L. Extending a biologically inspired model of choice: Multialternatives, nonlinearity and value-based multidimensional choice. Philosophical Transactions of The Royal Society of London, Series B, Biological Sciences, 2007, 362(1485): 1655-1670.
Article Google Scholar
Yang J, Yu K, Huang T. Supervised translation invariant sparse coding. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2010, pp.3517-3524.
Jurie F , Triggs B. Creating efficient codebooks for visual recognition. In Proc. the 10th International Conference on Computer Vision, Oct. 2005, pp.604-610.
Boureau Y L, Bach F, LeCun Y, Ponce J. Learning midlevel features for recognition. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2010, pp.2559-2566.
Van Gemert J C, Veenman C J, Smeulders A W M, Geusebroek J M. Visual word ambiguity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(7): 1271-1283.
Article Google Scholar
Jegou H, Perronnin F, Douze M, Sanchez J, Perez P, Schmid C. Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(9): 1704-1716.
Article Google Scholar
Chatfield K, Lempitsky V, Vedaldi A, Zisserman A. The devil is in the details: An evaluation of recent feature encoding methods. In Proc. the 22nd British Machine Vision Conference, Aug. 29-Sept. 22, 2011, pp.76:1-76:12.
Felzenszwalb P F, Huttenlocher D P. Pictorial structures for object recognition. International Journal of Computer Vision, 2005, 61(1): 55-79.
Article Google Scholar
Desai C, Ramanan D, Fowlkes C C. Discriminative models for multi-class object layout. International Journal of Computer Vision, 2011, 95(1): 1-12.
Article MATH MathSciNet Google Scholar
Vedaldi A, Gulshan V, Varma M, Zisserman A. Multiple kernels for object detection. In Proc. the 12th IEEE International Conference on Computer Vision, Sept. 29-Oct. 2, 2009, pp.606-613.
Pepik B, Stark M, Gehler P, Schiele B. Teaching 3D geometry to deformable part models. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2012, pp.3362-3369.
Yang Y, Ramanan D. Articulated pose estimation using flexible mixtures of parts. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2011, pp.1385-1392.
Zhu X, Ramanan D. Face detection pose estimation landmark localization in the wild. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2012, pp.2879-2886.
Duchenne O, Joulin A, Ponce J. A graph-matching kernel for object categorization. In Proc. IEEE International Conference on Computer Vision, November 2011, pp.1792-1799.
Song X, Wu T, Jia Y, Zhu S. Discriminatively trained and-or tree models for object detection. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2013, pp.3278-3285.
Carbonetto P, de Freitas N, Barnard K. A statistical model for general contextual object recognition. In Proc. the 8th European Conference on Computer Vision, May 2004, pp.350-362.
Kosslyn S M, Flynn R A, Amsterdam J B,Wang G. Components of high-level vision: A cognitive neuroscience analysis and accounts of neurological syndromes. Cognition, 1990, 34(3): 203-277.
Article Google Scholar
Mishkin M, Ungerleider L G, Macko K A. Object vision and spatial vision: Two cortial pathways. Trends in Neurosciences, 1983, 6: 414-417.
Article Google Scholar
Ungerleider L G, Mishkin M. Two Cortical Visual Systems. Cambridge, MA: MIT Press, 1982.
Google Scholar
Chai Y, Lempitsky V, Zisserman A. BiCoS: A bi-level co-segmentation method for image classification. In Proc. IEEE International Conference on Computer Vision, November 2011, pp.2579-2586.
Crandall D J, Huttenlocher D P. Weakly supervised learning of part-based spatial models for visual object recognition. In Proc. the 9th European Conference on Computer Vision, May 2006, pp.16-29.
Ren X, Ramanan D. Histograms of sparse codes for object detection. In Proc. Computer Vision and Pattern Recognition, June 2013, pp.3246-3253.
Malisiewicz T, Efros A A. Improving spatial support for objects via multiple segmentations. In Proc. the British Machine Vision Conference, September 2007, pp.55:1-55:10.
Pandey M, Lazebnik S. Scene recognition and weakly supervised object localization with deformable part-based models. In Proc. IEEE International Conference on Computer Vision, November 2011, pp.1307-1314.
Lowe D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2): 91-110.
Article Google Scholar
Zhang J, Huang K, Yu Y, Tan T. Boosted local structured HOG-LBp for object localization. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2011, pp.1393-1400.

Download references

Author information

Authors and Affiliations

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Chong Wang & Kai-Qi Huang

Authors

Chong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Kai-Qi Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kai-Qi Huang.

Additional information

Special Section on Object Recognition

This work was supported by the National Basic Research 973 Program of China under Grant No. 2012CB316302, the National Natural Science Foundation of China under Grant Nos. 61322209 and 61175007, the National Key Technology Research and Development Program of China under Grant No. 2012BAH07B01.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, C., Huang, KQ. VFM: Visual Feedback Model for Robust Object Recognition. J. Comput. Sci. Technol. 30, 325–339 (2015). https://doi.org/10.1007/s11390-015-1526-1

Download citation

Received: 19 December 2014
Revised: 02 February 2015
Published: 13 March 2015
Issue Date: March 2015
DOI: https://doi.org/10.1007/s11390-015-1526-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

VFM: Visual Feedback Model for Robust Object Recognition

Abstract

Article PDF

Similar content being viewed by others

Multi-scale Feature and Spatial Relation Inference for Object Detection

Object Detection with Semi-local Features

Context Refinement for Object Detection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

VFM: Visual Feedback Model for Robust Object Recognition

Abstract

Article PDF

Similar content being viewed by others

Multi-scale Feature and Spatial Relation Inference for Object Detection

Object Detection with Semi-local Features

Context Refinement for Object Detection

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation