Semantic Concept Annotation of Consumer Videos at Frame-Level Using Audio

Liang, Junwei; Jin, Qin; He, Xixi; Yang, Gang; Xu, Jieping; Li, Xirong

doi:10.1007/978-3-319-13168-9_12

Junwei Liang²¹,
Qin Jin²¹,
Xixi He²¹,
Gang Yang²¹,
Jieping Xu²¹ &
…
Xirong Li²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8879))

Included in the following conference series:

Pacific Rim Conference on Multimedia

2087 Accesses
4 Citations

Abstract

With the increasing use of audio sensors in user generated content (UGC) collection, semantic concept annotation using audio streams has become an important research problem. Huawei initiates a grand challenge in the International Conference on Multimedia & Expo (ICME) 2014: Huawei Accurate and Fast Mobile Video Annotation Challenge. In this paper, we present our semantic concept annotation system using audio stream only for the Huawei challenge. The system extracts audio stream from the video data and low-level acoustic features from the audio stream. Bag-of-feature representation is generated based on the low-level features and is used as input feature to train the support vector machine (SVM) concept classifier. The experimental results show that our audio-only concept annotation system can detect semantic concepts significantly better than random guess. It can also provide important complementary information to the visual-based concept annotation system for performance boost.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Multimodal Fusion: Combining Visual and Textual Cues for Concept Detection in Video

Improving video event retrieval by user feedback

Article Open access 12 May 2017

METU-MMDS: An Intelligent Multimedia Database System for Multimodal Content Extraction and Querying

Keywords

References

Snoek, C., Worring, M.: Concept-based Video Retrieval. Foundations and Trends in Information Retrieval (2009)
Google Scholar
Chang, S.F., Ellis, D., Jiang, W., Lee, K., Yanagawa, A., Loui, A.C., Luo, J.: Large-Scale Multimodal Semantic Concept Detection for Consumer Video. In: International Workshop on Multimedia Information Retrieval (MIR) (2007)
Google Scholar
Naphade, M.R., Smith, J.R., Tesic, J., Chang, S.F., Hsu, W., Kennedy, L., Hauptmann, A., Curtis, J.: Large-Scale Concept Ontology for Multimedia. IEEE Journal MultiMedia 13(3) (2006)
Google Scholar
Over, P., Awad, G., Michel, M., Fiscus, J., Sanders, G., Kraaij, W., Smeaton, A.F., Quéenot, G.: TRECVID 2013 – An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics. In: Proceedings of TRECVID. NIST, USA (2013), http://www-nlpir.nist.gov/projects/tvpubs/tv13.papers/tv13overview.pdf
Lee, K., Ellis, D.P.W.: Audio-Based Semantic Concept Classificationfor Consumer Video. IEEE Transactions on Audio, Speech, and Language Processing 18(6) (2010)
Google Scholar
Atrey, P.K., Kankanhalli, M.S., Jain, R.: Information Assimilation Framework for Event Detection in Multimedia Surveillance Systems. In: Multimedia Systems, pp. 239–253 (2006)
Google Scholar
Kolekar, M.H., Sengupta, S.: Semantic concept extraction from sports video for highlight generation. In: International Conference on Mobile Multimedia Communications (MobiMedia) (2006)
Google Scholar
Luo, H., Fan, J.: Building Concept Ontology for Medical Video Annotation. In: ACM Multimedia (2006)
Google Scholar
ICEM 2014 Huawei Accurate and Fast Mobile Video Annotation Challenge, http://www.icme2014.org/huawei-accurate-and-fast-mobile-video-annotation-challenge
Wold, E., Blum, T., Keislar, D., Wheaten, J.: Content-based Classification, Search, and Retrieval of Audio. IEEE Multimedia 3(3) (1996)
Google Scholar
Saunders, J.: Real-time Discrimination of Broadcast Speech/Music. In: ICASSP (1996)
Google Scholar
Scheirer, E., Slaney, M.: Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator. In: ICASSP (1997)
Google Scholar
Williams, G., Ellis, D.P.W.: Speech/Music Discrimination Based on Posterior Probability Features. In: Eurospeech (1999)
Google Scholar
Ma, L., Milner, B., Smith, D.: Acoustic Environment Classification. ACM Transactions on Speech and Language Processing 3(2) (2006)
Google Scholar
Eronen, A., Peltonen, V., Tuomi, J., Klapuri, A., Fagerlund, S., Sorsa, T., Lorho, G., Huopaniemi, J.: Audio-based Context Recognition. IEEE Trans. on Audio, Speech, and Language Processing 14(1) (2006)
Google Scholar
Brown, L., et al.: IBM Research and Columbia University TRECVID-2013 Multimedia Event Detection (MED), Multimedia Event Recounting (MER), Surveillance Event Detection (SED), and Semantic Indexing (SIN) Systems. In: TRECVID Workshop (2013)
Google Scholar
Jin, Q., Schulam, F., Rawat, S., Burger, S., Ding, D., Metze, F.: Categorizing Consumer Videos Using Audio. In: Interspeech (2012)
Google Scholar
Xue, X.B., Zhou, Z.H.: Distributional Features for Text Categorization. IEEE Transactions on Knowledge and Data Engineering 21(3) (2008)
Google Scholar
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR 2007 (2007)
Google Scholar
Li, X., Snoek, C., Worring, M., Koelma, D., Smeulders, A.: Bootstrapping Visual Categorization With Relevant Negatives. IEEE Transactions on Multimedia 15(4) (2013)
Google Scholar
Maji, S., Berg, A., Malik, J.: Classification using international kernel support vector machines is efficient. In: CVPR 2008 (2008)
Google Scholar
Zha, Z.-J., Wang, M., Zheng, Y.-T., Yang, Y., Hong, R., Chua, T.-S.: Interactive Video Indexing with Statistical Active Learning. IEEE Transactions on Multimedia 14(1), 17–27 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Multimedia Computing Lab, School of Information, Renmin University of China, China
Junwei Liang, Qin Jin, Xixi He, Gang Yang, Jieping Xu & Xirong Li

Authors

Junwei Liang
View author publications
You can also search for this author in PubMed Google Scholar
Qin Jin
View author publications
You can also search for this author in PubMed Google Scholar
Xixi He
View author publications
You can also search for this author in PubMed Google Scholar
Gang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jieping Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xirong Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, National University of Singapore, 117417, Singapore
Wei Tsang Ooi
Informatics Institute, Intelligent Systems Lab Amsterdam (ISLA), University of Amsterdam, Science Park 904, 1098 GH, Amsterdam, The Netherlands
Cees G. M. Snoek
Department of Computer Science, Universiti Tunku Abdul Rahman, 31900, Kampar, Perak, Malaysia
Hung Khoon Tan
Faculty of Computing and Informatics, Persiaran Multimedia, Multimedia University, 63100, Cyberjaya, Selangor, Malaysia
Chin-Kuan Ho
EURECOM, Campus Sophia Tech, 450 route des Chappes, 06904, Sophia Antipolis, France
Benoit Huet
Department of Computer Science, City University of Hong Kong, Tat Chee Ave, Kowloon, Hong Kong, China
Chong-Wah Ngo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liang, J., Jin, Q., He, X., Yang, G., Xu, J., Li, X. (2014). Semantic Concept Annotation of Consumer Videos at Frame-Level Using Audio. In: Ooi, W.T., Snoek, C.G.M., Tan, H.K., Ho, CK., Huet, B., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2014. PCM 2014. Lecture Notes in Computer Science, vol 8879. Springer, Cham. https://doi.org/10.1007/978-3-319-13168-9_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-13168-9_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13167-2
Online ISBN: 978-3-319-13168-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Semantic Concept Annotation of Consumer Videos at Frame-Level Using Audio

Abstract

Chapter PDF

Similar content being viewed by others

Multimodal Fusion: Combining Visual and Textual Cues for Concept Detection in Video

Improving video event retrieval by user feedback

METU-MMDS: An Intelligent Multimedia Database System for Multimodal Content Extraction and Querying

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Semantic Concept Annotation of Consumer Videos at Frame-Level Using Audio

Abstract

Chapter PDF

Similar content being viewed by others

Multimodal Fusion: Combining Visual and Textual Cues for Concept Detection in Video

Improving video event retrieval by user feedback

METU-MMDS: An Intelligent Multimedia Database System for Multimodal Content Extraction and Querying

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation