Abstract
We investigate the role of sparsity and localized features in a biologically-inspired model of visual object classification. As in the model of Serre, Wolf, and Poggio, we first apply Gabor filters at all positions and scales; feature complexity and position/scale invariance are then built up by alternating template matching and max pooling operations. We refine the approach in several biologically plausible ways. Sparsity is increased by constraining the number of feature inputs, lateral inhibition, and feature selection. We also demonstrate the value of retaining some position and scale information above the intermediate feature level. Our final model is competitive with current computer vision algorithms on several standard datasets, including the Caltech 101 object categories and the UIUC car localization task. The results further the case for biologically-motivated approaches to object classification.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Agarwal, S., Awan, A., & Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(11), 1475–1490.
Berg, A. C., Berg, T. L., & Malik, J. (2005). Shape matching and object recognition using low distortion correspondence. In CVPR, June 2005.
Bouchard, G., & Triggs, B. (2005). Hierarchical part-based visual object categorization. In CVPR, June 2005.
Csurka, G., Dance, C., Willamowski, J., Fan, L., & Bray, C. (2005). Visual categorization with bags of keypoints. In ECCV international workshop on statistical learning in computer vision, Prague, 2004.
DiCarlo, J., & Cox, D. (2007). Untangling invariant object recognition. Trends in Cognitive Science, 11, 333–341.
Epshtein, B., & Ullman, S. (2005). Feature hierarchies for object classification. In ICCV, Beijing.
Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In CVPR workshop on generative-model based vision.
Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In CVPR.
Figueiredo, M. (2003). Adaptive sparseness for supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(9), 1150–1159.
Franc, V., & Hlavac, V. (2004). Statistical pattern recognition toolbox for Matlab, version 2.04.
Fritz, M., Leibe, B., Caputo, B., & Schiele, B. (2005). Integrating representative and discriminative models for object category detection. In ICCV (pp. 1363–1370), Beijing, China, October 2005.
Fukushima, K. (1980). Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193–202.
Grauman, K., & Darrell, T. (2006). Pyramid match kernels: discriminative classification with sets of image features (Technical Report MIT-CSAIL-TR-2006-020), March 2006.
Holub, A., Welling, M., & Perona, P. (2005). Exploiting unlabeled data for hybrid object classification. In NIPS workshop on inter-class transfer, Whistler, BC, December 2005.
Hubel, D., & Wiesel, T. (1959). Receptive fields of single neurones in the cat’s striate cortex. Journal of Physiology, 148, 574–591.
Knoblich, U., Bouvrie, J., & Poggio, T. (2007). Biophysical models of neural computation: max and tuning circuits (Technical Report CBCL paper), April 2007.
Krishnapuram, B., Carin, L., Figueiredo, M., & Hartemink, A. (2005). Sparse multinomial logistic regression: fast algorithms and generalization bounds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6), 957–968.
Lazebnik, S., Schmid, C., & Ponce, J. (2006) Beyond bags of features: Spatial pyramid. matching for recognizing natural scene categories. In CVPR, June 2006.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In ECCV workshop on statistical learning in computer vision (pp. 17–32), Prague, Czech Republic, May 2004.
Logothetis, N., Pauls, J., & Poggio, T. (1995). Shape representation in the inferior temporal cortex of monkeys. Current Biology, 5, 552–563.
Mladenic, D., Brank, J., Grobelnik, M., & Milic-Frayling, N. (2004). Feature selection using linear classifier weights: interaction with classification models. In The 27th annual international ACM SIGIR conference (SIGIR 2004) (pp. 234–241), Sheffield, UK, July 2004.
Moosmann, F., Triggs, B., & Jurie, F. (2006). Randomized clustering forests for building fast and discriminative visual vocabularies. In Neural information processing systems (NIPS), November 2006.
Mutch, J., & Lowe, D. G. (2006). Multiclass object recognition with sparse, localized features. In CVPR (pp. 11–18), New York, June 2006.
Olshausen, B., & Field, D. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381, 607–609.
Opelt, A., Pinz, A., Fussenegger, M., & Auer, P. (2006). Generic object recognition with boosting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(3).
Poggio, T., & Edelman, S. (1990). A network that learns to recognize three-dimensional objects. Nature, 343, 263–266.
Potter, M. (1975). Meaning in visual search. Science, 187, 965–966.
Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2(11), 1019–1025.
Rolls, E. T., & Deco, G. (2001). The computational neuroscience of vision. Oxford: Oxford University Press.
Serre, T., Kouh, M., Cadieu, C., Knoblich, U., Kreiman, G., & Poggio, T. (2005). A theory of object recognition: computations and circuits in the feedforward path of the ventral stream in primate visual cortex (Technical Report CBCL Paper #259/AI Memo #2005-036). Massachusetts Institute of Technology, Cambridge, MA, October 2005.
Serre, T., Wolf, L., & Poggio, T. Object recognition with features inspired by visual cortex. In CVPR, San Diego, June 2005.
Thorpe, S., Fize, D., & Marlot, C. (1996). Speed of processing in the human visual system. Nature, 381, 520–522.
Ullman, S., Vidal-Naquet, M., & Sali, E. (2002). Visual features of intermediate complexity and their use in classification. Nature Neuroscience, 5(7), 682–687.
Zhang, H., Berg, A., Maire, M., & Malik, J. (2006). Svm-knn: discriminative nearest neighbor classification for visual category recognition. In CVPR, June 2006.
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper updates and extends an earlier presentation (Mutch and Lowe 2006) of this research in CVPR 2006.
J. Mutch’s research described in this paper was carried out at the University of British Columbia.
Rights and permissions
About this article
Cite this article
Mutch, J., Lowe, D.G. Object Class Recognition and Localization Using Sparse Features with Limited Receptive Fields. Int J Comput Vis 80, 45–57 (2008). https://doi.org/10.1007/s11263-007-0118-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-007-0118-0