Human Detection Using Learned Part Alphabet and Pose Dictionary

Yao, Cong; Bai, Xiang; Liu, Wenyu; Latecki, Longin Jan

doi:10.1007/978-3-319-10602-1_17

Cong Yao¹⁹,
Xiang Bai¹⁹,
Wenyu Liu¹⁹ &
…
Longin Jan Latecki²⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8693))

Included in the following conference series:

European Conference on Computer Vision

22k Accesses
18 Citations

Abstract

As structured data, human body and text are similar in many aspects. In this paper, we make use of the analogy between human body and text to build a compositional model for human detection in natural scenes. Basic concepts and mature techniques in text recognition are introduced into this model. A discriminative alphabet, each grapheme of which is a mid-level element representing a body part, is automatically learned from bounding box labels. Based on this alphabet, the flexible structure of human body is expressed by means of symbolic sequences, which correspond to various human poses and allow for robust, efficient matching. A pose dictionary is constructed from training examples, which is used to verify hypotheses at runtime. Experiments on standard benchmarks demonstrate that the proposed algorithm achieves state-of-the-art or competitive performance.

Download to read the full chapter text

Chapter PDF

Poselet-Based Contextual Rescoring for Human Pose Estimation via Pictorial Structures

Article 30 November 2015

Discriminative Hierarchical Part-Based Models for Human Parsing and Action Recognition

DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model

Keywords

References

Andriluka, M., Roth, S., Schiele, B.: People-tracking-by-detection and people-detection-by-tracking. In: Proc. CVPR (2008)
Google Scholar
Bai, X., Wang, X., Latecki, L.J., Liu, W.: Active skeleton for non-rigid object detection. In: Proc. ICCV (2009)
Google Scholar
Benenson, R., Mathias, M., Timofte, R., Gool, L.V.: Pedestrian detection at 100 frames per second. In: Proc. CVPR (2012)
Google Scholar
Bourdev, L., Maji, S., Brox, T., Malik, J.: Detecting people using mutually consistent poselet activations. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 168–181. Springer, Heidelberg (2010)
Chapter Google Scholar
Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3d human pose annotations. In: Proc. ICCV (2009)
Google Scholar
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Cheng, Y.: Mean shift, mode seeking, and clustering. IEEE Trans. PAMI 17(8), 790–799 (1995)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proc. CVPR (2005)
Google Scholar
Doersch, C., Singh, S., Gupta, A., Sivic, J., Efros, A.A.: What makes paris look like paris? ACM Trans. Graphics 31(3), 101 (2012)
Google Scholar
Dollár, P., Appel, R., Kienzle, W.: Crosstalk cascades for frame-rate pedestrian detection. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 645–659. Springer, Heidelberg (2012)
Chapter Google Scholar
Dollár, P., Babenko, B., Belongie, S., Perona, P., Tu, Z.: Multiple component learning for object detection. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 211–224. Springer, Heidelberg (2008)
Chapter Google Scholar
Dollar, P., Belongie, S., Perona, P.: The fastest pedestrian detector in the west. In: Proc. BMVC (2010)
Google Scholar
Dollar, P., Tu, Z., Perona, P., Belongie, S.: Integral channel features. In: Proc. BMVC (2009)
Google Scholar
Dollar, P., Wojek, C., Appel, R., Perona, P.: Pedestrian detection: A benchmark. In: Proc. CVPR (2009)
Google Scholar
Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: An evaluation of the state of the art. IEEE Trans. PAMI 34(4), 743–761 (2012)
Article Google Scholar
Endres, I., Shih, K.J., Jiaa, J., Hoiem, D.: Learning collections of part models for object recognition. In: Proc. CVPR (2013)
Google Scholar
Enzweiler, M., Gavrila, D.M.: Monocular pedestrian detection: Survey and experiments. IEEE Trans. PAMI 31(12), 2179–2195 (2009)
Article Google Scholar
Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes (VOC) challenge. IJCV 88(2), 303–338 (2010)
Article Google Scholar
Fei-Fei, L., Perona, P.: A bayesian heirarcical model for learning natural scene categories. In: Proc. CVPR (2005)
Google Scholar
Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. PAMI 32(9), 1627–1645 (2010)
Article Google Scholar
Forsyth, D., Fleck, M.: Body plans. In: Proc. CVPR (1997)
Google Scholar
Gall, J., Lempitsky, V.: Class-specific hough forests for object detection. In: Proc. CVPR (2009)
Google Scholar
Girshick, R., Felzenszwalb, P., McAllester, D.: Object detection with grammar models. In: Proc. NIPS (2011)
Google Scholar
Kukich, K.: Techniques for automatically correcting words in text. ACM Computing Surveys 24(4), 377–439 (1992)
Article Google Scholar
Lee, Y.J., Efros, A.A., Hebert, M.: Style-aware mid-level representation for discovering visual connections in space and time. In: Proc. ICCV (2013)
Google Scholar
Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. IJCV 77(1-3), 259–289 (2008)
Article Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10(8), 707–710 (1996)
MathSciNet Google Scholar
Li, Y., Liu, B.: A normalized levenshtein distance metric. IEEE Trans. PAMI 29(6), 1091–1095 (2007)
Article Google Scholar
McCann, S., Lowe, D.G.: Spatially local coding for object recognition. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part I. LNCS, vol. 7724, pp. 204–217. Springer, Heidelberg (2013)
Chapter Google Scholar
Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33(1), 31–88 (2001)
Article Google Scholar
Opelt, A., Pinz, A., Zisserman, A.: Learning an alphabet of shape and appearance for multi-class object detection. IJCV 80(1), 16–44 (2008)
Article Google Scholar
Papageorgiou, C., Poggio, T.: A trainable system for object detection. IJCV 38(1), 15–33 (2000)
Article MATH Google Scholar
Van Rijsbergen, C.: Information Retrieval, 2nd edn. Butterworths, London (1979)
Google Scholar
Schwartz, W.R., Kembhavi, A., Harwood, D., Davis, L.S.: Human detection using partial least squares analysis. In: Proc. ICCV (2009)
Google Scholar
Seemann, E., Schiele, B.: Cross-articulation learning for robust detection of pedestrians. In: Franke, K., Müller, K.-R., Nickolay, B., Schäfer, R. (eds.) DAGM 2006. LNCS, vol. 4174, pp. 242–252. Springer, Heidelberg (2006)
Chapter Google Scholar
Singh, S., Gupta, A., Efros, A.A.: Unsupervised discovery of mid-level discriminative patches. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 73–86. Springer, Heidelberg (2012)
Chapter Google Scholar
Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: Proc. ICCV (2003)
Google Scholar
Song, X., Wu, T., Jia, Y., Zhu, S.C.: Discriminatively trained and-or tree models for object detection. In: Proc. CVPR (2013)
Google Scholar
Tan, D., Li, Y., Kim, T.K.: Fast pedestrian detection by cascaded random forest with dominant orientation templates. In: Proc. BMVC (2012)
Google Scholar
Tsai, S.S., Parameswarany, V., Berclazy, J., Vedanthamy, R., Grzeszczuky, R., Girod, B.: Design of a text detection system via hypothesis generation and verification. In: Proc. ACCV (2012)
Google Scholar
Walk, S., Majer, N., Schindler, K., Schiele, B.: New features and insights for pedestrian detection. In: Proc. ICCV (2010)
Google Scholar
Wang, X., Bai, X., Yang, X., Liu, W., Latecki, L.J.: Maximal cliques that satisfy hard constraints with application to deformable object model learning. In: Proc. NIPS (2011)
Google Scholar
Wang, X., Han, T.X., Yan, S.: An HOG-LBP human detector with partial occlusion handling. In: Proc. ICCV (2009)
Google Scholar
Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: A learned multi-scale representation for scene text recognition. In: Proc. CVPR (2014)
Google Scholar
Zhu, S.C., Mumford, D.: A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision 2(4), 259–362 (1995)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Information Engineering, Huazhong University of Science and Technology, China
Cong Yao, Xiang Bai & Wenyu Liu
Department of Computer and Information Sciences, Temple University, USA
Longin Jan Latecki

Authors

Cong Yao
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Bai
View author publications
You can also search for this author in PubMed Google Scholar
Wenyu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Longin Jan Latecki
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Toronto, 6 King’s College Road, M5H 3S5, Toronto, ON, Canada
David Fleet
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic
Tomas Pajdla
Max-Planck-Institut für Informatik, Campus E1 4, 66123, Saarbrücken, Germany
Bernt Schiele
ESAT - PSI, iMinds, KU Leuven, Kasteelpark Arenberg 10, Bus 2441, 3001, Leuven, Belgium
Tinne Tuytelaars

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yao, C., Bai, X., Liu, W., Latecki, L.J. (2014). Human Detection Using Learned Part Alphabet and Pose Dictionary. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8693. Springer, Cham. https://doi.org/10.1007/978-3-319-10602-1_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-10602-1_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10601-4
Online ISBN: 978-3-319-10602-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Human Detection Using Learned Part Alphabet and Pose Dictionary

Abstract

Chapter PDF

Similar content being viewed by others

Poselet-Based Contextual Rescoring for Human Pose Estimation via Pictorial Structures

Discriminative Hierarchical Part-Based Models for Human Parsing and Action Recognition

DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Human Detection Using Learned Part Alphabet and Pose Dictionary

Abstract

Chapter PDF

Similar content being viewed by others

Poselet-Based Contextual Rescoring for Human Pose Estimation via Pictorial Structures

Discriminative Hierarchical Part-Based Models for Human Parsing and Action Recognition

DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation