Abstract
Matching people based on their imaged face is hard because of the well known problems of illumination, pose, size and expression variation. Indeed these variations can exceed those due to identity. Fortunately, videos of people have the happy benefit of containing multiple exemplars of each person in a form that can easily be associated automatically using straightforward visual tracking. We describe progress in harnessing these multiple exemplars in order to retrieve humans automatically in videos, given a query face in a shot. There are three areas of interest: (i) the matching of sets of exemplars provided by “tubes” of the spatial-temporal volume; (ii) the description of the face using a spatial orientation field; and, (iii) the structuring of the problem so that retrieval is immediate at run time.
The result is a person retrieval system, able to retrieve a ranked list of shots containing a particular person in the manner of Google. The method has been implemented and tested on two feature length movies.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Arandjelovic, O., Shakhnarovich, G., Fisher, J., Cipolla, R., Darrell, T.: Face recognition with image sets using manifold density divergence. In: Proc. CVPR 2005 (2005)
Bart, E., Byvatov, E., Ullman, S.: View-invariant recognition using corresponding object fragments. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3022, pp. 152–165. Springer, Heidelberg (2004)
Choudhury, R., Schmid, C., Mikolajczyk, K.: Face detection and tracking in a video by propagating detection probabilities. IEEE PAMI 25(10), 1215–1228 (2003)
Duygulu, P., Hauptman, A.: What’s news, what’s not? associating news videos with words. In: Enser, P.G.B., Kompatsiaris, Y., O’Connor, N.E., Smeaton, A., Smeulders, A.W.M. (eds.) CIVR 2004. LNCS, vol. 3115, pp. 132–140. Springer, Heidelberg (2004)
Eickeler, S., Wallhoff, F., Iurgel, U., Rigoll, G.: Content-Based Indexing of Images and Video Using Face Detection and Recognition Methods. In: ICASSP 2001 (2001)
Felzenszwalb, P., Huttenlocher, D.: Pictorial structures for object recognition. IJCV 61(1) (2005)
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: Proc. CVPR 2003 (2003)
Fitzgibbon, A., Zisserman, A.: Joint manifold distance: a new approach to appearance based clustering. In: Proc. CVPR 2003 (June 2003)
Heisele, B., Ho, P., Wu, J., Poggio, T.: Face recognition: component–based versus global approaches. CVIU 91(1–2), 6–21 (2003)
Klein, D., Kamvar, S., Manning, C.: From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In: Proc. Int. Conf. on Machine Learning, pp. 307–314 (2002)
Krueger, V., Zhou, S.: Exemplar-based face recognition from video. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 732–746. Springer, Heidelberg (2002)
Leung, T.: Texton correlation for recognition. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 203–214. Springer, Heidelberg (2004)
Leung, T., Malik, J.: Representing and recognizing the visual appearance of materials using three-dimensional textons. IJCV 43(1), 29–44 (2001)
Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)
MacCormick, J.P., Blake, A.: A probabilistic exclusion principle for tracking multiple objects. In: Proc. ICCV 1999 (1999)
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. In: CVPR 2003 (2003)
Mikolajczyk, K., Schmid, C., Zisserman, A.: Human detection based on a probabilistic assembly of robust part detectors. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 69–82. Springer, Heidelberg (2004)
Satoh, S., Nakamura, Y., Kanade, T.: Name-It: Naming and detecting faces in news videos. IEEE Multimedia 6(1), 22–35 (1999)
Shakhnarovich, G., Fisher, J., Darrel, T.: Face recognition from long-term observations. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2352, pp. 851–865. Springer, Heidelberg (2002)
Shakhnarovich, G., Moghaddam, B.: Face recognition in subspaces. In: Li, S.Z., Jain, A.K. (eds.) Handbook of face recognition. Springer, Heidelberg (2004)
Sivic, J., Schaffalitzky, F., Zisserman, A.: Object level grouping for video shots. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3022, pp. 85–98. Springer, Heidelberg (2004)
Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: Proc. ICCV 2003 (October 2003)
Wiskott, L., Fellous, J., Krueger, N., von der Malsburg, C.: Face recognition by elastic bunch graph matching. IEEE PAMI 19(7), 775–779 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sivic, J., Everingham, M., Zisserman, A. (2005). Person Spotting: Video Shot Retrieval for Face Sets. In: Leow, WK., Lew, M.S., Chua, TS., Ma, WY., Chaisorn, L., Bakker, E.M. (eds) Image and Video Retrieval. CIVR 2005. Lecture Notes in Computer Science, vol 3568. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11526346_26
Download citation
DOI: https://doi.org/10.1007/11526346_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27858-0
Online ISBN: 978-3-540-31678-7
eBook Packages: Computer ScienceComputer Science (R0)