3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints

Rothganger, Fred; Lazebnik, Svetlana; Schmid, Cordelia; Ponce, Jean

doi:10.1007/s11263-005-3674-1

3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints

Published: March 2006

Volume 66, pages 231–259, (2006)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

International Journal of Computer Vision Aims and scope Submit manuscript

3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints

Download PDF

Fred Rothganger¹,
Svetlana Lazebnik¹,
Cordelia Schmid² &
…
Jean Ponce³

1808 Accesses
311 Citations
9 Altmetric
Explore all metrics

Abstract.

This article introduces a novel representation for three-dimensional (3D) objects in terms of local affine-invariant descriptors of their images and the spatial relationships between the corresponding surface patches. Geometric constraints associated with different views of the same patches under affine projection are combined with a normalized representation of their appearance to guide matching and reconstruction, allowing the acquisition of true 3D affine and Euclidean models from multiple unregistered images, as well as their recognition in photographs taken from arbitrary viewpoints. The proposed approach does not require a separate segmentation stage, and it is applicable to highly cluttered scenes. Modeling and recognition results are presented.

Article PDF

3D object recognition from cluttered and occluded scenes with a compact local feature

Article 13 April 2019

DaLI: Deformation and Light Invariant Descriptor

Article 14 February 2015

Learning and Matching Multi-View Descriptors for Registration of Point Clouds

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Ayache, N. and Faugeras, O.D. 1986. Hyper: A new approach for the recognition and positioning of two-dimensional objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(1):44–54.
Google Scholar
Baker, S. and Kanade, T. 2002. Limits on super-resolution and how to break them. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(9):1167–1183.
Article Google Scholar
Baumberg, A. 2000. Reliable feature matching across widely separated views. In Conference on Computer Vision and Pattern Recognition, pp. 774–781.
Belhumeur, P.N., Hespanha, J.P., and Kriegman, D.J. 1997. Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7):711–720.
Article Google Scholar
Blostein, D. and Ahuja, N. 1989. A multiscale region detector. Computer Vision, Graphics and Image Processing, 45:22–41.
Article Google Scholar
Burns, J.B., Weiss, R.S., and Riseman, E.M. 1993. View variation of point-set and line-segment features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(1):51–68.
Article Google Scholar
Capel, D. and Zisserman, A. 2001. Super-resolution from multiple views using learnt image models. In Conference on Computer Vision and Pattern Recognition.
Cheeseman, P., Kanefsky, B., Kraft, R., and Stutz, J. 1994. Super-resolved surface reconstruction from multiple Images. Technical report, NASA Ames Research Center.
Crowley, J.L. and Parker, A.C. 1984. A representation of shape based on peaks and ridges in the difference of low-pass transform. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:156–170.
Google Scholar
Duda, R.O., Hart, P.E., and Stork, D.G. 2001. Pattern Classification. 2nd edition. Wiley-Interscience.
Faugeras, O., Luong, Q.T., and Papadopoulo, T. 2001. The Geometry of Multiple Images. MIT Press.
Faugeras, O.D. and Hebert, M. 1986. The representation, recognition, and locating of 3-D objects. International Journal of Robotics Research, 5(3):27–52.
Google Scholar
Fergus, R., Perona, P., and Zisserman, A. 2003. Object class recognition by unsupervised scale-invariant learning. In Conference on Computer Vision and Pattern Recognition, vol. II, pp. 264–270.
Ferrari, V., Tuytelaars, T., and Van Gool, L. 2004. Simultaneous object recognition and segmentation by image exploration. In European Conference on Computer Vision.
Fischler, M.A. and Bolles, R.C. 1981. Random sample consensus: A paradigm for model fitting with application to image analysis and automated cartography. Communications ACM, 24(6):381–395.
MathSciNet Google Scholar
Forsyth, D. and Ponce, J. 2002. Computer Vision: A Modern Approach. Prentice-Hall.
Gårding, J. and Lindeberg, T. 1996. Direct computation of shape cues using scale-adapted spatial derivative operators. International Journal of Computer Vision, 17(2):163–191.
Google Scholar
Grimson, W.E.L. 1990. The combinatories of object recognition in cluttered environments using constrained search. Artificial Intelligence Journal, 44(1–2):121–166.
MATH MathSciNet Google Scholar
Grimson, W.E.L. and Lozano-Pérez, T. 1987. Localizing overlapping parts by searching the interpretation tree. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(4):469–482.
Google Scholar
Harris, C. and Stephens, M. 1988. A combined edge and corner detector. In 4th Alvey Vision Conference, Manchester, UK, pp. 189–192.
Google Scholar
Hartley, R. and Zisserman, A. 2000. Multiple View Geometry in Computer Vision. Cambridge University Press.
Huttenlocher, D.P. and Ullman, S. 1987. Object recognition using alignment. In International Conference on Computer Vision, pp. 102–111.
Kadir, T. and Brady, M. 2001. Scale, saliency and image description. International Journal of Computer Vision, 45(2):83–105.
Article Google Scholar
Koenderink, J.J. and van Doom, A.J. 1991. Affine structure from motion. Journal of the Optical Society of America, 8(2):377–385.
Google Scholar
Lamdan, Y. and Wolfson, H.J. 1988. Geometric hashing: A general and efficient model-based reconitiion scheme. In International Conference on Computer Vision, pp. 238–249.
Lamdan, Y. and Wolfson, H.J. 1991. On the Error Analysis of “Geometric hashing.” In Conference on Computer Vision and Pattern Recognition. Maui, Hawaii, pp. 22–27.
Google Scholar
Lindeberg, T. 1998. Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2):77–116.
Google Scholar
Liu, J., Mundy, J., Forsyth, D., Zisserman, A., and Rothwell, C. 1993. Efficient recognition of rotationally symmetric surfaces and straight homogeneous generalized cylinders. In Conference on Computer Vision and Pattern Recognition. New York City, NY, pp. 123–128.
Lowe, D. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2): 91–110.
Google Scholar
Lowe, D.G. 1987. The viewpoint consistency constraint. International Journal of Computer Vision, 1(1):57–72.
Article Google Scholar
Mahamud, S. and Hebert, M. 2003. The optimal distance measure for object detection. In Conference on Computer Vision and Pattern Recognition.
Mahamud, S., Hebert, M., Omori, Y., and Ponce, J. 2001. Provably-convergent iterative methods for projective structure from motion. In Conference on Computer Vision and Pattern Recognition, pp. 1018–1025.
Matas, J., Chum, O., Urban, M., and Pajdla, T. 2002. Robust wide baseline stereo from maximally stable extremal regions. In British Machine Vision Conference, vol. I, pp. 384–393.
Mikolajczyk, K. and Schmid, C. 2001. Indexing based on scale invariant interest points. In International Conference on Computer Vision. Vancouver, Canada, pp. 525–531.
Mikolajczyk, K. and Schmid, C. 2002. An affine invariant interest point detector. In European Conference on Computer Vision, vol. I. pp. 128–142.
Mikolajczyk, K. and Schmid, C. 2003. A performance evaluation of local descriptors. In Conference on Computer Vision and Pattern Recognition.
Moreels, P., Maire, M., and Perona, P. 2004. Recognition by probabilistic hypothesis construction. In European Conference on Computer Vision.
Mundy, J.L. and Zisserman, A. 1992. Geometric Invariance in Computer Vision. MIT Press.
Mundy, J.L., Zisserman, A., and Forsyth, D. 1994. Applications of Invariance in Computer Vision, vol. 825 of Lecture Notes in Computer Science. Springer-Verlag.
Murase, H. and Nayar, S.K. 1995. Visual learning and recognition of 3-D objects from appearance. International Journal of Computer Vision, 14:5–24.
Article Google Scholar
Nalwa, V S. 1988. Line-drawing interpretation: A mathematical framework. International Journal of Computer Vision, 2:103–124.
Article Google Scholar
Pentland, A., Moghaddam, B., and Starner, T. 1994. View-based and modular eigenspaces for face recognition. In Conference on Computer Vision and Pattern Recognition. Seattle, WA.
Poelman, C.J. and Kanade, T. 1997. A paraperspective factorization method for shape and motion recovery. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(3):206–218.
Article Google Scholar
Ponce, J. 2000. On computing metric upgrades of projective reconstructions under the rectangular pixel assumption. In Second SMILE Workshop, pp. 18–27.
Ponce, J., Chelberg, D., and Mann, W. 1989. Invariant properties of straight homogeneous generalized cylinders and their contours. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(9):951–966.
Article Google Scholar
Pope, A.R. and Lowe, D.G. 2000. Probabilistic models of appearance for 3-D object recognition. International Journal of Computer Vision, 40(2):149–167.
Article Google Scholar
Pritchett, P. and Zisserman, A. 1998. Wide baseline stereo matching. In International Conference on Computer Vision, Bombay, India, pp. 754–760.
Rothganger, F., Lazebnik, S., Schmid, C., and Ponce, J. 2003. 3D object modeling and recognition using affine-invariant Patches and Multi-View Spatial Constraints. In Conference on Computer Vision and Pattern Recognition, vol. II, pp. 272–277.
Rothganger, F., Lazebnik, S., Schmid, C., and Ponce, J. 2004. Segmenting, modeling, and matching video clips containing multiple moving objects. In Conference on Computer Vision and Pattern Recognition, Washington, DC, June 2004, Vol. 2, pp. 914–921.
Schaffalitzky, F. and Zisserman, A. 2002. Multi-view matching for unordered image sets, or “How do I organize my holiday snaps?”. In European Conference on Computer Vision, vol. I, pp. 414–431.
Schmid, C. and Mohr, R. 1997. Local grayvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5):530–535.
Google Scholar
Schneiderman, H. and Kanade, T. 2000. A statistical method for 3D object detection applied to faces and cars. In Conference on Computer Vision and Pattern Recognition.
Selinger, A. and Nelson, R. 1999. A perceptual grouping hierarchy for appearance-based 3D object recognition. Computer Vision and Image Understanding, 76(1):83–92.
Article Google Scholar
Tell, D. and Carlsson, S. 2000. Wide baseline point matching using affine invariants computed from intensity profiles. In Proc. 6th ECCV. Dublin, Ireland, pp. 814–828, Springer LNCS 1842–1843.
Google Scholar
Thompson, D. and Mundy, J. 1987. Three-dimensional model matching from an unconstrained viewpoint. In International Conference on Robotics and Automation. Raleigh, NC, pp. 208–220.
Tomasi, C. and Kanade, T. 1992. Shape and motion from image streams: A factorization method. International Journal of Computer Vision, 9(2):137–154.
Article Google Scholar
Torr, P. and Zisserman, A.Z. 2000. MLESAC: A new robust estimator with application to estimating image geometry. Computer Vision and Image Understanding, 78(1):138–156.
Article Google Scholar
Triggs, B., McLauchlan, P.F., Hartley, R.I., and Fitzgibbon, A.W. 1999. Bundle adjustment---A modern synthesis. In: B. Triggs, A. Zisserman, and R. Szeliski (Eds.), Vision Algorithms, Corfu, Greece, pp. 298–372, Spinger-Verlag, LNCS 1883.
Google Scholar
Turk, M. and Pentland, A. 1991. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1):71–86.
Google Scholar
Tuytelaars, T. and Van Gool, L. 2004. Matching widely separated views based on affinely invariant neighbourhoods. International Journal of Computer Vision. (in press)
Voorhees, H. and Poggio, T. 87. Detecting textons and texture boundaries in natural images. In International Conference on Computer Vision, pp. 250–258.
Weber, M., Welling, M., and Perona, P. 2000. Unsupervised learning of models for recognition. In European Conference on Computer Vision.
Weinshall, D. and Tomasi, C. 1995. Linear and incremental acquisition of invariant shape models from image sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(5):512–517.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Beckman Institute, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
Fred Rothganger & Svetlana Lazebnik
INRIA, Rhône-Alpes, 665, Avenue de l'Europe, 38330, Montbonnot, France
Cordelia Schmid
Department of Computer Science and Beckman Institute, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
Jean Ponce

Authors

Fred Rothganger
View author publications
You can also search for this author in PubMed Google Scholar
Svetlana Lazebnik
View author publications
You can also search for this author in PubMed Google Scholar
Cordelia Schmid
View author publications
You can also search for this author in PubMed Google Scholar
Jean Ponce
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fred Rothganger.

Additional information

A preliminary version of this article has appeared in Rothganger et al. (2003).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rothganger, F., Lazebnik, S., Schmid, C. et al. 3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints. Int J Comput Vision 66, 231–259 (2006). https://doi.org/10.1007/s11263-005-3674-1

Download citation

Received: 03 March 2004
Revised: 07 December 2004
Accepted: 28 March 2005
Issue Date: March 2006
DOI: https://doi.org/10.1007/s11263-005-3674-1

Keywords:

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints

Abstract.

Article PDF

Similar content being viewed by others

3D object recognition from cluttered and occluded scenes with a compact local feature

DaLI: Deformation and Light Invariant Descriptor

Learning and Matching Multi-View Descriptors for Registration of Point Clouds

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords:

Navigation

3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints

Abstract.

Article PDF

Similar content being viewed by others

3D object recognition from cluttered and occluded scenes with a compact local feature

DaLI: Deformation and Light Invariant Descriptor

Learning and Matching Multi-View Descriptors for Registration of Point Clouds

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords:

Search

Navigation