Abstract
This paper addresses the problem of fully automated mining of public space video data, a highly desirable capability under contemporary commercial and security considerations. This task is especially challenging due to the complexity of the object behaviors to be profiled, the difficulty of analysis under the visual occlusions and ambiguities common in public space video, and the computational challenge of doing so in real-time. We address these issues by introducing a new dynamic topic model, termed a Markov Clustering Topic Model (MCTM). The MCTM builds on existing dynamic Bayesian network models and Bayesian topic models, and overcomes their drawbacks on sensitivity, robustness and efficiency. Specifically, our model profiles complex dynamic scenes by robustly clustering visual events into activities and these activities into global behaviours with temporal dynamics. A Gibbs sampler is derived for offline learning with unlabeled training data and a new approximation to online Bayesian inference is formulated to enable dynamic scene understanding and behaviour mining in new video data online in real-time. The strength of this model is demonstrated by unsupervised learning of dynamic scene models for four complex and crowded public scenes, and successful mining of behaviors and detection of salient events in each.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Ali, S., & Shah, M. (2008). Floor fields for tracking in high density crowd scenes. In European conference on computer vision.
Basharat, A., Gritai, A., & Shah, M. (2008). Learning object motion patterns for anomaly detection and improved object detection. In IEEE conference on computer vision and pattern recognition.
Benezeth, Y., Jodoin, P.-M., Saligrama, V., & Rosenberger, C. (2009). Abnormal events detection based on spatio-temporal co-occurences. In IEEE conference on computer vision and pattern recognition.
Berclaz, J., Fleuret, F., & Fua, P. (2008). Multi-camera tracking and atypical motion detection with behavioral maps. In European conference on computer vision.
Bishop, C. M. (2006). Pattern recognition and machine learning. Berlin: Springer.
Blei, D., & Lafferty, J. (2006). Dynamic topic models. In International conference on machine learning.
Blei, D., & McAuliffe, J. (2007). Supervised topic models. In Neural information processing systems.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Boiman, O., & Irani, M. (2007). Detecting irregularities in images and in video. International Journal of Computer Vision, 74(1), 17–31.
Chang, S. F., Luo, J., Maybank, S., Schonfeld, D., & Xu, D. (2008). An introduction to the special issue on event analysis in videos. IEEE Transactions on Circuits and Systems for Video Technology, 18(11), 1469–1472.
Chen, M. y., Li, H., & Hauptmann, A. (2009). Informedia @ trecvid 2009: analyzing video motions. In Proc TRECvid.
Dee, H., & Hogg, D. (2004). Detecting inexplicable behaviour. In British machine vision conference.
Dollar, P., Rabaud, V., Cottrell, G., & Belongie, S. (2005). Behavior recognition via sparse spatio-temporal features. In VS-PETS (pp. 65–72).
Duong, T., Bui, H., Phung, D., & Venkatesh, S. (2005). Activity recognition and abnormality detection with the switching hidden semi-Markov model. In IEEE conference on computer vision and pattern recognition.
Gilks, W., Richardson, S., & Spiegelhalter, D. (Eds.) (1995). Markov chain Monte Carlo in practice. London/Boca Raton: Chapman & Hall/CRC Press.
Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101, 5228–5235.
Griffiths, T., Steyvers, M., Blei, D., & Tenenbaum, J. (2007). Integrating topics and syntax. In Neural information processing systems.
Gruber, A., Rosen-Zvi, M., & Weiss, Y. (2007). Hidden topic Markov models. In Artificial intelligence and statistics.
HOSDB. Imagery library for intelligent detection systems (i-lids). In IEEE conf. on crime and security (2006).
Hospedales, T., Gong, S., & Xiang, T. (2009). A Markov clustering topic model for behaviour mining in video. In IEEE international conference on computer vision.
Hu, W., Tan, T., Wang, L., & Maybank, S. (2004). A survey on visual surveillance of object motion and behaviors. IEEE Transactions on Systems, Man and Cybernetics. Part C, Applications and Reviews, 34(3), 334–352.
Hu, W., Xiao, X., Fu, Z., Xie, D., Tan, T., & Maybank, S. (2006). A system for learning statistical motion patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9), 1450–1464.
Hu, Z., Ye, G., Jia, G., Chen, X., Hu, Q., Jiang, K., Wang, Y., Qing, L., Tian, Y., Wu, X., & Gaoa, W. (2009). Pku@trecvid2009: Single-actor and pair-activity event detection in surveillance video. In Proc. TRECvid.
Inoue, N., Hao, S., Saito, T., & Shinoda, K. (2009). Titgt at trecvid 2009 workshop. In Proc. TRECvid.
Johnson, N., & Hogg, D. (1996). Learning the distribution of object trajectories for event recognition. Image and Vision Computing, 8, 609–615.
Kapoor, A., Horvitz, E., & Basu, S. (2007). Selective supervision: Guiding supervised learning with decision-theoretic active learning. In International joint conference on artificial intelligence.
Kim, J., & Grauman, K. (2009). Observe locally, infer globally: a space-time mrf for detecting abnormal activities with incremental update. In IEEE conference on computer vision and pattern recognition.
Li, J., Gong, S., & Xiang, T. (2008). Global behaviour inference using probabilistic latent semantic analysis. In British machine vision conference.
Meng, J., & Chang, S.-F. (1996). Tools for compressed-domain video indexing and editing. In SPIE conference on storage and retrieval for image and video databases.
National institute of standards and technology (NIST): Trec video retrieval evaluation. http://trecvid.nist.gov/.
Niebles, J. C., Wang, H., & Fei-Fei, L. (2008). Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision, 79(3), 299–318.
Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.
Pritch, Y., Rav-Acha, A., & Peleg, S. (2008). Nonchronological video synopsis and indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11), 1971–1984.
Rosen-Zvi, M., Griffiths, T., Steyvers, M., & Smyth, P. (2004). The author-topic model for authors and documents. In Uncertainty in artificial intelligence.
Saleemi, I., Shafique, K., & Shah, M. (2009). Probabilistic modeling of scene dynamics for applications in visual surveillance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(8), 1472–1485.
Scovanner, P., Ali, S., & Shah, M. (2007). A 3-dimensional sift descriptor and its application to action recognition. In ACM international conference on multimedia.
Sillito, R. R., & Fisher, R. B. (2008). Semi-supervised learning for anomalous trajectory detection. In British machine vision conference.
Smith, K., Quelhas, P., & Gatica-Perez, D. (2006). Detecting abandoned luggage items in a public space. In Performance evaluation of tracking and surveillance (PETS) workshop.
Stauffer, C., & Grimson, W. (2000). Learning patterns of activity using real-time tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 747–757.
Wallach, H. (2006). Topic modeling: beyond bag-of-words. In International conference on machine learning.
Wallach, H., Murray, I., Salakhutdinov, R., & Mimno, D. (2009). Evaluation methods for topic models. In International conference on machine learning.
Wang, Y., & Mori, G. (2009). Human action recognition by semilatent topic models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(10), 1762–1774.
Wang, X., Tieu, K., & Grimson, E. (2006). Learning semantic scene models by trajectory analysis. In European conference on computer vision.
Wang, X., Ma, X., & Grimson, E. (2009). Unsupervised activity perception by hierarchical bayesian models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(3), 539–555.
Xiang, T., & Gong, S. (2006). Beyond tracking: Modelling activity and understanding behaviour. International Journal of Computer Vision, 61(1), 21–51.
Xiang, T., & Gong, S. (2008a). Activity based surveillance video content modelling. Pattern Recognition, 41, 2309–2326.
Xiang, T., & Gong, S. (2008b). Video behavior profiling for anomaly detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(5), 893–908.
Xie, L., Sundaram, H., & Campbell, M. (2008). Event mining in multimedia streams. Proceedings of the IEEE, 96(4), 623–647.
Zhong, H., Shi, J., & Visontai, M. (2004). Detecting unusual activity in video. In IEEE conference on computer vision and pattern recognition (pp. 819–826).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hospedales, T., Gong, S. & Xiang, T. Video Behaviour Mining Using a Dynamic Topic Model. Int J Comput Vis 98, 303–323 (2012). https://doi.org/10.1007/s11263-011-0510-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-011-0510-7