Sliding Shapes for 3D Object Detection in Depth Images

Song, Shuran; Xiao, Jianxiong

doi:10.1007/978-3-319-10599-4_41

Shuran Song¹⁹ &
Jianxiong Xiao¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8694))

Included in the following conference series:

European Conference on Computer Vision

19k Accesses
144 Citations
9 Altmetric

Abstract

The depth information of RGB-D sensors has greatly simplified some common challenges in computer vision and enabled breakthroughs for several tasks. In this paper, we propose to use depth maps for object detection and design a 3D detector to overcome the major difficulties for recognition, namely the variations of texture, illumination, shape, viewpoint, clutter, occlusion, self-occlusion and sensor noises. We take a collection of 3D CAD models and render each CAD model from hundreds of viewpoints to obtain synthetic depth maps. For each depth rendering, we extract features from the 3D point cloud and train an Exemplar-SVM classifier. During testing and hard-negative mining, we slide a 3D detection window in 3D space. Experiment results show that our 3D detector significantly outperforms the state-of-the-art algorithms for both RGB and RGB-D images, and achieves about ×1.7 improvement on average precision compared to DPM and R-CNN. All source code and data are available online.

Download to read the full chapter text

Chapter PDF

Densely Constrained Depth Estimator for Monocular 3D Object Detection

Dynamic Depth Fusion and Transformation for Monocular 3D Object Detection

Center3D: Center-Based Monocular 3D Object Detection with Joint Depth Understanding

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Nevatia, R., Binford, T.O.: Description and recognition of curved objects. Artificial Intelligence (1977)
Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. PAMI (2010)
Google Scholar
Malisiewicz, T., Gupta, A., Efros, A.A.: Ensemble of exemplar-svms for object detection and beyond. In: ICCV (2011)
Google Scholar
Wang, X., Yang, M., Zhu, S., Lin, Y.: Regionlets for generic object detection. In: ICCV (2013)
Google Scholar
Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition (2013)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
Google Scholar
Shotton, J., Girshick, R., Fitzgibbon, A., Sharp, T., Cook, M., Finocchio, M., Moore, R., Kohli, P., Criminisi, A., Kipman, A., et al.: Efficient human pose estimation from single depth images. PAMI (2013)
Google Scholar
Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., Cook, M., Moore, R.: Real-time human pose recognition in parts from single depth images. Communications of the ACM (2013)
Google Scholar
Barron, J.T., Malik, J.: Intrinsic scene properties from a single rgb-d image. In: CVPR (2013)
Google Scholar
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor Segmentation and Support Inference from RGBD Images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012)
Chapter Google Scholar
Gupta, S., Arbelaez, P., Malik, J.: Perceptual organization and recognition of indoor scenes from RGB-D images. In: CVPR (2013)
Google Scholar
Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, J., Hodges, S., Freeman, D., Davison, A., Fitzgibbon, A.: Kinectfusion: Real-time 3D reconstruction and interaction using a moving depth camera. In: UIST (2011)
Google Scholar
Johnson, A.E., Hebert, M.: Using spin images for efficient object recognition in cluttered 3d scenes. PAMI (1999)
Google Scholar
Newcombe, R.A., Davison, A.J., Izadi, S., Kohli, P., Hilliges, O., Shotton, J., Molyneaux, D., Hodges, S., Kim, D., Fitzgibbon, A.: Kinectfusion: Real-time dense surface mapping and tracking. In: ISMAR (2011)
Google Scholar
Tang, J., Miller, S., Singh, A., Abbeel, P.: A textured object recognition pipeline for color and depth image data. In: ICRA (2012)
Google Scholar
Kim, Y.M., Mitra, N.J., Yan, D.M., Guibas, L.: Acquiring 3D indoor environments with variability and repetition. TOG (2012)
Google Scholar
Nan, L., Xie, K., Sharf, A.: A search-classify approach for cluttered indoor scene understanding. TOG (2012)
Google Scholar
Crow, F.C.: Summed-area tables for texture mapping. TOG (1984)
Google Scholar
Guo, R., Hoiem, D.: Support surface prediction in indoor scenes. In: ICCV (2013)
Google Scholar
Knopp, J., Prasad, M., Willems, G., Timofte, R., Van Gool, L.: Hough transform and 3D SURF for robust three dimensional classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 589–602. Springer, Heidelberg (2010)
Chapter Google Scholar
Vondrick, C., Khosla, A., Malisiewicz, T., Torralba, A.: HOGgles: Visualizing Object Detection Features. In: ICCV (2013)
Google Scholar
Xiao, J., Owens, A., Torralba, A.: SUN3D: A database of big spaces reconstructed using sfm and object labels. In: ICCV (2013)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge (2010)
Google Scholar
Ye, E.S.: Object detection in rgb-d indoor scenes. Master’s thesis, UC Berkeley (2013)
Google Scholar
Shrivastava, A., Gupta, A.: Building part-based object detectors via 3D geometry. In: ICCV (2013)
Google Scholar
Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: SUN database: Large-scale scene recognition from abbey to zoo. In: CVPR (2010)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Google Scholar
Hoiem, D., Chodpathumwan, Y., Dai, Q.: Diagnosing error in object detectors. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 340–353. Springer, Heidelberg (2012)
Chapter Google Scholar
Divvala, S.K., Efros, A.A., Hebert, M.: How important are “Deformable parts” in the deformable parts model? In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012 Ws/Demos, Part III. LNCS, vol. 7585, pp. 31–40. Springer, Heidelberg (2012)
Chapter Google Scholar
Zhu, X., Vondrick, C., Ramanan, D., Fowlkes, C.: Do we need more training data or better models for object detection? In: BMVC (2012)
Google Scholar
Zia, M.Z., Stark, M., Schiele, B., Schindler, K.: Detailed 3D representations for object recognition and modeling. PAMI (2013)
Google Scholar
Lim, J.J., Pirsiavash, H., Torralba, A.: Parsing ikea objects: Fine pose estimation. In: ICCV (2013)
Google Scholar
Satkin, S., Hebert, M.: 3dnn: Viewpoint invariant 3D geometry matching for scene understanding. In: ICCV (2013)
Google Scholar
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: 3dRR 2013 (2013)
Google Scholar
Liebelt, J., Schmid, C.: Multi-view object class detection with a 3D geometric model. In: CVPR (2010)
Google Scholar
Aubry, M., Maturana, D., Efros, A.A., Russell, B.C., Sivic, J.: Seeing 3D chairs: exemplar part-based 2D-3D alignment using a large dataset of cad models. In: CVPR (2014)
Google Scholar
Ren, X., Bo, L., Fox, D.: Rgb-(d) scene labeling: Features and algorithms. In: CVPR (2012)
Google Scholar
Koppula, H.S., Anand, A., Joachims, T., Saxena, A.: Semantic labeling of 3d point clouds for indoor scenes. In: NIPS (2011)
Google Scholar
Zheng, B., Zhao, Y., Yu, J.C., Ikeuchi, K., Zhu, S.C.: Beyond point clouds: Scene understanding by reasoning geometry and physics. In: CVPR (2013)
Google Scholar
Kim, B., Kohli, P., Savarese, S.: 3D scene understanding by Voxel-CRF. In: ICCV (2013)
Google Scholar
Hernández-López, J.J., Quintanilla-Olvera, A.L., López-Ramírez, J.L., Rangel-Butanda, F.J., Ibarra-Manzano, M.A., Almanza-Ojeda, D.L.: Detecting objects using color and depth segmentation with kinect sensor. Procedia Technology (2012)
Google Scholar
Anguelov, D., Taskarf, B., Chatalbashev, V., Koller, D., Gupta, D., Heitz, G., Ng, A.: Discriminative learning of markov random fields for segmentation of 3D scan data. In: CVPR (2005)
Google Scholar
Knopp, J., Prasad, M., Gool, L.V.: Scene cut: Class-specific object detection and segmentation in 3D scenes. In: 3DIMPVT (2011)
Google Scholar
Lin, H., Gao, J., Zhou, Y., Lu, G., Ye, M., Zhang, C., Liu, L., Yang, R.: Semantic decomposition and reconstruction of residential scenes from lidar data. TOG (2013)
Google Scholar
Salas-Moreno, R.F., Newcombe, R.A., Strasdat, H., Kelly, P.H., Davison, A.J.: Slam++: Simultaneous localisation and mapping at the level of objects. In: CVPR (2013)
Google Scholar
Drost, B., Ulrich, M., Navab, N., Ilic, S.: Model globally, match locally: Efficient and robust 3D object recognition. In: CVPR (2010)
Google Scholar
Park, I.K., Germann, M., Breitenstein, M.D., Pfister, H.: Fast and automatic object pose estimation for range images on the gpu. Machine Vision and Applications (2010)
Google Scholar
Woodford, O.J., Pham, M.T., Maki, A., Perbet, F., Stenger, B.: Demisting the hough transform for 3D shape recognition and registration (2014)
Google Scholar
Velizhev, A., Shapovalov, R., Schindler, K.: Implicit shape models for object detection in 3D point clouds. In: International Society of Photogrammetry and Remote Sensing Congress (2012)
Google Scholar
Zhang, J., Marszałek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: A comprehensive study (2007)
Google Scholar
Blum, M., Springenberg, J.T., Wulfing, J., Riedmiller, M.: A learned feature descriptor for object recognition in rgb-d data. In: ICRA (2012)
Google Scholar
Johnson, A.: Spin-Images: A Representation for 3-D Surface Matching. PhD thesis, Robotics Institute, Carnegie Mellon University (1997)
Google Scholar
Tombari, F., Salti, S., Di Stefano, L.: Unique signatures of histograms for local surface description. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part III. LNCS, vol. 6313, pp. 356–369. Springer, Heidelberg (2010)
Chapter Google Scholar
Zaharescu, A., Boyer, E., Varanasi, K., Horaud, R.: Surface feature detection and description with applications to mesh matching. In: CVPR (2009)
Google Scholar
Frome, A., Huber, D., Kolluri, R., Bülow, T., Malik, J.: Recognizing objects in range data using regional point descriptors. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3023, pp. 224–237. Springer, Heidelberg (2004)
Chapter Google Scholar
Alexandre, L.A.: 3D descriptors for object and category recognition: a comparative evaluation. In: Workshop on Color-Depth Camera Fusion in Robotics at the IROS (2012)
Google Scholar
Fouhey, D.F., Collet, A., Hebert, M., Srinivasa, S.: Object recognition robust to imperfect depth data. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012 Ws/Demos, Part II. LNCS, vol. 7584, pp. 83–92. Springer, Heidelberg (2012)
Chapter Google Scholar
Glover, J., Popovic, S.: Bingham procrustean alignment for object detection in clutter (2013)
Google Scholar
Körtgen, M., Park, G.J., Novotni, M., Klein, R.: 3D shape matching with 3D shape contexts. In: The 7th Central European Seminar on Computer Graphics (2003)
Google Scholar
Chen, H., Bhanu, B.: 3D free-form object recognition in range images using local surface patches. Pattern Recognition Letters (2007)
Google Scholar
Besl, P.J., Mckay, H.D.: A method for registration of 3-D shapes. PAMI (1992)
Google Scholar
Jiang, H., Xiao, J.: A linear approach to matching cuboids in RGBD images. In: CVPR (2013)
Google Scholar
Jia, Z., Gallagher, A., Saxena, A., Chen, T.: 3D-based reasoning with blocks, support, and stability. In: CVPR (2013)
Google Scholar
Wu, K., Levine, M.D.: Recovering parametric geons from multiview range data. In: CVPR (1994)
Google Scholar
Bo, L., Lai, K., Ren, X., Fox, D.: Object recognition with hierarchical kernel descriptors. In: CVPR (2011)
Google Scholar
Bo, L., Ren, X., Fox, D.: Unsupervised feature learning for rgb-d based object recognition. In: Experimental Robotics (2013)
Google Scholar
Bo, L., Ren, X., Fox, D.: Depth kernel descriptors for object recognition (2011)
Google Scholar
Socher, R., Huval, B., Bhat, B., Manning, C.D., Ng, A.Y.: Convolutional-recursive deep learning for 3D object classification. In: NIPS (2012)
Google Scholar
Lai, K., Bo, L., Ren, X., Fox, D.: Sparse distance learning for object recognition combining RGB and depth information. In: ICRA (2011)
Google Scholar
Lai, K., Bo, L., Ren, X., Fox, D.: A scalable tree-based approach for joint object and pose recognition. In: AAAI (2011)
Google Scholar
El-Gaaly, T., Torki, M.: Rgbd object pose recognition using local-global multi-kernel regression. In: ICPR (2012)
Google Scholar
Zhang, H., El-Gaaly, T., Elgammal, A., Jiang, Z.: Joint object and pose recognition using homeomorphic manifold analysis. In: AAAI (2013)
Google Scholar
Karpathy, A., Miller, S., Fei-Fei, L.: Object discovery in 3D scenes via shape analysis. In: ICRA (2013)
Google Scholar
Shao, T., Xu, W., Zhou, K., Wang, J., Li, D., Guo, B.: An interactive approach to semantic modeling of indoor scenes with an rgbd camera. TOG (2012)
Google Scholar
Hetzel, G., Leibe, B., Levi, P., Schiele, B.: 3D object recognition from range images using local feature histograms. In: CVPR (2001)
Google Scholar
Golovinskiy, A., Kim, V.G., Funkhouser, T.: Shape-based recognition of 3D point clouds in urban environments. In: ICCV (2009)
Google Scholar
Xiong, X., Munoz, D., Bagnell, J.A.D., Hebert, M.: 3-D scene analysis via sequenced predictions over points and regions. In: ICRA (2011)
Google Scholar
Zhu, X., Zhao, H., Liu, Y., Zhao, Y., Zha, H.: Segmentation and classification of range image from an intelligent vehicle in urban environment (2010)
Google Scholar
Wohlkinger, W., Vincze, M.: Ensemble of shape functions for 3D object classification. In: ROBIO (2011)
Google Scholar
Lai, K., Bo, L., Ren, X., Fox, D.: Detection-based object labeling in 3D scenes. In: ICRA (2012)
Google Scholar
Zhu, M., Derpanis, K.G., Yang, Y., Brahmbhatt, S., Zhang, M., Phillips, C., Lecce, M., Daniilidis, K.: Single image 3D object detection and pose estimation for grasping (2014)
Google Scholar
Kim, B., Xu, S., Savarese, S.: Accurate localization of 3D objects from rgb-d data using segmentation hypotheses. In: CVPR (2013)
Google Scholar
Zhang, J., Kan, C., Schwing, A.G., Urtasun, R.: Estimating the 3D layout of indoor scenes and its clutter from depth sensors. In: ICCV (2013)
Google Scholar
Lin, D., Fidler, S., Urtasun, R.: Holistic scene understanding for 3D object detection with RGBD cameras. In: ICCV (2013)
Google Scholar
Wu, Z., Song, S., Khosla, A., Tang, X., Xiao, J.: 3D ShapeNets for 2.5D object recognition and Next-Best-View prediction. ArXiv e-prints (2014)
Google Scholar
Zhang, Y., Song, S., Tan, P., Xiao, J.: PanoContext: A whole-room 3D context model for panoramic scene understanding. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 681–698. Springer, Heidelberg (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Princeton University, USA
Shuran Song & Jianxiong Xiao

Authors

Shuran Song
View author publications
You can also search for this author in PubMed Google Scholar
Jianxiong Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Toronto, 6 King’s College Road, M5H 3S5, Toronto, ON, Canada
David Fleet
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic
Tomas Pajdla
Max-Planck-Institut für Informatik, Campus E1 4, 66123, Saarbrücken, Germany
Bernt Schiele
ESAT - PSI, iMinds, KU Leuven, Kasteelpark Arenberg 10, Bus 2441, 3001, Leuven, Belgium
Tinne Tuytelaars

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Song, S., Xiao, J. (2014). Sliding Shapes for 3D Object Detection in Depth Images. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8694. Springer, Cham. https://doi.org/10.1007/978-3-319-10599-4_41

Download citation

DOI: https://doi.org/10.1007/978-3-319-10599-4_41
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10598-7
Online ISBN: 978-3-319-10599-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Sliding Shapes for 3D Object Detection in Depth Images

Abstract

Chapter PDF

Similar content being viewed by others

Densely Constrained Depth Estimator for Monocular 3D Object Detection

Dynamic Depth Fusion and Transformation for Monocular 3D Object Detection

Center3D: Center-Based Monocular 3D Object Detection with Joint Depth Understanding

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Sliding Shapes for 3D Object Detection in Depth Images

Abstract

Chapter PDF

Similar content being viewed by others

Densely Constrained Depth Estimator for Monocular 3D Object Detection

Dynamic Depth Fusion and Transformation for Monocular 3D Object Detection

Center3D: Center-Based Monocular 3D Object Detection with Joint Depth Understanding

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation