Abstract
This paper presents a method of learning reconfigurable hierarchical And-Or models to integrate context and occlusion for car detection. The And-Or model represents the regularities of car-to-car context and occlusion patterns at three levels: (i) layouts of spatially-coupled N cars, (ii) single cars with different viewpoint-occlusion configurations, and (iii) a small number of parts. The learning process consists of two stages. We first learn the structure of the And-Or model with three components: (a) mining N-car contextual patterns based on layouts of annotated single car bounding boxes, (b) mining the occlusion configurations based on the overlapping statistics between single cars, and (c) learning visible parts based on car 3D CAD simulation or heuristically mining latent car parts. The And-Or model is organized into a directed and acyclic graph which leads to the Dynamic Programming algorithm in inference. In the second stage, we jointly train the model parameters (for appearance, deformation and bias) using Weak-Label Structural SVM. In experiments, we test our model on four car datasets: the KITTI dataset [11], the street parking dataset [19], the PASCAL VOC2007 car dataset [7], and a self-collected parking lot dataset. We compare with state-of-the-art variants of deformable part-based models and other methods. Our model obtains significant improvement consistently on the four datasets.
Chapter PDF
Similar content being viewed by others
References
Azizpour, H., Laptev, I.: Object detection using strongly-supervised deformable part models. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 836–849. Springer, Heidelberg (2012)
Behley, J., Steinhage, V., Cremers, A.: Laser-based Segment Classification Using a Mixture of Bag-of-Words. In: IROS (2013)
Branson, S., Perona, P., Belongie, S.: Strong supervision from weak annotation: Interactive training of deformable part models. In: ICCV (2011)
Chen, G., Ding, Y., Xiao, J., Han, T.X.: Detection evolution with multi-order contextual co-occurrence. In: CVPR (2013)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for multi-class object layout. IJCV 95(1), 1–12 (2011)
Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. IJCV (2010)
Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. TPAMI (2010)
Felzenszwalb, P., McAllester, D.: Object detection grammars. Tech. rep., University of Chicago, Computer Science TR-2010-02 (2010)
Felzenszwalb, P., Huttenlocher, D.: Distance transforms of sampled functions. Theory of Computing (2012)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: CVPR (2012)
Geiger, A., Wojek, C., Urtasun, R.: Joint 3D estimation of objects and scene layout. In: NIPS (2011)
Girshick, R., Felzenszwalb, P., McAllester, D.: Object detection with grammar models. In: NIPS (2011)
Girshick, R.B., Felzenszwalb, P.F., McAllester, D.: Discriminatively trained deformable part models, release 5, http://people.cs.uchicago.edu/~rbg/latent-release5/
Hejrati, M., Ramanan, D.: Analyzing 3D objects in cluttered images. In: NIPS (2012)
Hoiem, D., Chodpathumwan, Y., Dai, Q.: Diagnosing error in object detectors. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 340–353. Springer, Heidelberg (2012)
Hoiem, D., Efros, A., Hebert, M.: Putting objects in perspective. IJCV 80(1), 3–15 (2008)
Hu, W., Zhu, S.C.: Learning 3D object templates by quantizing geometry and appearance spaces. TPAMI (to appear, 2014)
Li, B., Hu, W., Wu, T.F., Zhu, S.C.: Modeling occlusion by discriminative and-or structures. In: ICCV (2013)
Li, B., Song, X., Wu, T.F., Hu, W., Pei, M.: Coupling-and-decoupling: A hierarchical model for occlusion-free object detection. PR 47, 3254–3264 (2014)
Mathias, M., Benenson, R., Timofte, R., Van Gool, L.: Handling occlusions with franken-classifiers. In: ICCV (2013)
McAllester, D., Keshet, J.: Generalization bounds and consistency for latent structural probit and ramp loss. In: NIPS (2011)
Ouyang, W., Wang, X.: Single-pedestrian detection aided by multi-pedestrian detection. In: CVPR (2013)
Pepik, B., Stark, M., Gehler, P., Schiele, B.: Teaching 3d geometry to deformable part models. In: CVPR (2012)
Pepik, B., Stark, M., Gehler, P., Schiele, B.: Occlusion patterns for object class detection. In: CVPR (2013)
Sadeghi, M., Farhadi, A.: Recognition using visual phrases. In: CVPR (2011)
Song, X., Wu, T.F., Jia, Y., Zhu, S.C.: Discriminatively trained and-or tree models for object detection. In: CVPR (2013)
Tang, S., Andriluka, M., Schiele, B.: Detection and tracking of occluded people. In: BMVC (2012)
Tu, Z., Bai, X.: Auto-context and its application to high-level vision tasks and 3D brain image segmentation. TPAMI (2010)
Yang, Y., Baker, S., Kannan, A., Ramanan, D.: Recognizing proxemics in personal photos. In: CVPR (2012)
Zhu, L., Chen, Y., Yuille, A., Freeman, W.: Latent hierarchical structural learning for object detection. In: CVPR (2010)
Zhu, S.C., Mumford, D.: A stochastic grammar of images. Found. Trends. Comput. Graph. Vis. (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Li, B., Wu, T., Zhu, SC. (2014). Integrating Context and Occlusion for Car Detection by Hierarchical And-Or Model. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8694. Springer, Cham. https://doi.org/10.1007/978-3-319-10599-4_42
Download citation
DOI: https://doi.org/10.1007/978-3-319-10599-4_42
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10598-7
Online ISBN: 978-3-319-10599-4
eBook Packages: Computer ScienceComputer Science (R0)