Abstract
A scheme for recognizing 3D objects from single 2D images under orthographic projection is introduced. The scheme proceeds in two stages. In the first stage, the categorization stage, the image is compared to prototype objects. For each prototype, the view that most resembles the image is recovered, and, if the view is found to be similar to the image, the class identity of the object is determined. In the second stage, the identification stage, the observed object is compared to the individual models of its class, where classes are expected to contain objects with relatively similar shapes. For each model, a view that matches the image is sought. If such a view is found, the object's specific identity is determined. The advantage of categorizing the object before it is identified is twofold. First, the image is compared to a smaller number of models, since only models that belong to the object's class need to be considered. Second, the cost of comparing the image to each model in a class is very low, because correspondence is computed once for the whole class. More specifically, the correspondence and object pose computed in the categorization stage to align the prototype with the image are reused in the identification stage to align the individual models with the image. As a result, identification is reduced to a series of simple template comparisons. The paper concludes with an algorithm for constructing optimal prototypes for classes of objects.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Bajcsy, R. and Solina, F., 1987. Three dimensional object representation revisited. Proc. of The First Int. Conf. on Computer Vision, London, pp. 231–240.
Basri, R. 1993. Viewer-centered representations in object recognition: A computational approach. In C.H., Chen, L.F., Pau, and P.S.P., Wang (Eds.), Handbook of Pattern Recognition and Computer Vision. World Scientific Publishing Company: Singapore, Vol. 5, No. 4, 863–882.
Basri, R. 1995. Paraperspective ≡ affine. Int. Journal of Computer Vision, forthcoming.
Basri, R. and Ullman, S. 1993. The alignment of objects with smooth surfaces. CVGIP: Image Understanding, 57(3):331–345.
Biederman, I. 1985. Human image understanding: Recent research and a theory. Computer Vision, Graphics, and Image Processing, 32:29–73.
Binford, T.O. 1971. Visual perception by computer. IEEE Conf. on Systems and Control.
Brooks, R. 1981. Symbolic reasoning among 3-dimensional models and 2-dimensional images. Artificial Intelligence, 17:285–349.
Chien, C.H. and Aggarwal, J.K. 1987. Shape recognition from single silhouette. Proc. of The First Int. Conf. on Computer Vision, London, pp. 481–490.
Connell, J.H. and Brady, M. 1987. Generating and generalizing models of visual objects. Artificial Intelligence, 31:159–183.
Davis, L.S. 1979. Shape matching using relaxation techniques. IEEE Trans. on Pattern Analysis and Machine Intel., 1(1):60–72.
Duda, R.O. and Hart, P.E. 1973. Pattern Classification and Scene Analysis. Wiley-Interscience Publication, John Wiley and Sons, Inc.
Faugeras, O.D. and Hebert, M. 1986. The representation, recognition and location of 3D objects. Int. J. Robotics Research, 5(3):27–52.
Fischler, M.A. and Bolles, R.C. 1981. Random sample consensus: A paradigm for model fitting with application to image analysis and automated cartography. Com. of the A.C.M., 24(6):381–395.
Forsyth, D., Mundy, J.L., Zisserman, A., Coelho, C., Heller, A., and Rothwell, C. 1991. Invariant descriptors for 3-D object recognition and pose. IEEE Trans. on Pattern Analysis and Machine Intel., 13:971–991.
Grimson, W.E.L. and Lozano-Pérez, T. 1984. Model-based recognition and localization from sparse data. Int. Journal of Robotics Research, 3:3–35.
Ho, S. 1987. Representing and using functional definitions for visual recognition. Ph.D. Dissertation, University of Wisconsin, Madison.
Hoffman, D.D. and Richards, W. 1985. Parts of recognition. Cognition, 18:65–96.
Huttenlocher, D.P. and Ullman, S. 1990. Recognizing solid objects by alignment with an image. Int. Journal of Computer Vision, 5(2):195–212.
Jacobs, D.W. 1992. Space efficient 3D model indexing. Proc. of Image Understanding Workshop, pp. 717–725.
Koenderink, J.J. and Van, Doorn, A.J. 1982. The shape of smooth objects and the way contours end. Perception, 11:129–137.
Koenderink, J. and van, Doorn, A. 1991. Affine structure from motion. Journal of the Optical Society of America, 8(2):377–385.
Lamdan, Y., Schwartz, J.T., and Wolfson, H. 1987. On recognition of 3-D objects from 2-D images. Courant Inst. of Math. Sci., Rob. TR 122.
Lowe, D.G. 1985. Three-dimensional object recognition from single two-dimensional images. Courant Inst. of Math. Sci., Rob. TR 202.
Marr, D. and Nishihara, H.K. 1978. Representation and recognition of the spatial organization of three-dimensional shapes. Proc. of the Royal Society, London B, 200:269–294.
Mundy, J.L. and Zisserman, A. 1992. Geometric Invariance in Computer Vision. M.I.T. Press.
Pentland, A. 1987, Recognition by parts. Proc. of the First Int. Conf. on Computer Vision, pp. 612–620.
Poggio, T. 1990. 3D object recognition: On a result by Basri and Ullman. TR 9005–03, IRST, Povo, Italy.
Poggio, T. and Vetter, T. 1992. Recognition and structure from one 2D model view: observations on prototypes, object classes, and symmetries. M.I.T., A.I. Memo No. 1347.
Rivlin, E., Dickenson, S., and Rosenfeld, A. 1994. Recognition by Functional Parts. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 267–275.
Rosch, E., Mervis, C.B., Gray, W.D., Johnson, D.M. and Boyes-Braem, P. 1976. Basic objects in natural categories. Cognitive Psychology, 8:382–439.
Rosenfeld, A., Hummel, R., and Zucker, S. 1976. Scene labeling by relaxation operations. IEEE Trans. on System and Man Cybernetics, 7:420–433.
Shapira, Y. and Ullman, S. 1991. A pictorial approach to object classification. Proc. of the 12th Int. Conf. on Artificial Intelligence, pp. 1257–1263.
Stark, L. and Bowyer, K. 1991. Achieving generalized object recognition through reasoning about association of function to structure. IEEE Trans. on Pattern Analysis and Machine Intelligence. 13(10):992–1006.
Thompson, D.W. and Mundy J.L. 1987. Three dimensional model matching from an unconstrained viewpoint. Proc. of IEEE Int. Conf. on Robotics and Automation, pp. 208–220.
Tomasi, C. and Kanade, T. 1992. Factoring image sequences into shape and motion. Int. Journal of Computer Vision, 9(2).
Ullman, S. 1989. Aligning pictorial descriptions: An approach to object recognition. Cognition, 32(3):193–254.
Ullman, S. and Basri, R. 1991. Recognition by linear combinations of models. IEEE Trans. on Pattern Analysis and Machine Intelligence, 13(10):992–1006.
Vaina, L.M. and Zlateva, S.D. 1990. The largest convex patches: A boundary-based method for obtaining object parts. Biological Cybernetics, 62:225–236.
Weinshall, D. 1993. Model-based invariants for 3D vision. International Journal of Computer Vision, 10(1):27–42.
Weiss, I. 1988. Projective invariants of shape. DARPA Image Understanding Workshop, pp. 1125–1134.
Winston, P.H., Binford, T.O., Katz, B., and Lowry, M. 1984. Learning physical description from functional definitions, examples and precedents. M.I.T., A.I. Memo 679.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Basri, R. Recognition by prototypes. Int J Comput Vision 19, 147–167 (1996). https://doi.org/10.1007/BF00055802
Received:
Revised:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF00055802