Competitive Video Retrieval with vitrivr

Rossetto, Luca; Giangreco, Ivan; Gasser, Ralph; Schuldt, Heiko

doi:10.1007/978-3-319-73600-6_41

Luca Rossetto²¹,
Ivan Giangreco²¹,
Ralph Gasser²¹ &
…
Heiko Schuldt²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10705))

Included in the following conference series:

International Conference on Multimedia Modeling

2866 Accesses
8 Citations

Abstract

This paper presents the competitive video retrieval capabilities of vitrivr. The vitrivr stack is the continuation of the IMOTION system which participated to the Video Browser Showdown competitions since 2015. The primary focus of vitrivr and its participation in this competition is to simplify and generalize the system’s individual components, making them easier to deploy and use. The entire vitrivr stack is made available as open source software.

Access provided by CONRICYT-eBooks. Download conference paper PDF

iAutoMotion – an Autonomous Content-Based Video Retrieval Engine

V3C – A Research Video Collection

VERGE in VBS 2022

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In this paper we present the current iteration of vitrivr [6], an open-source content-based multimedia retrieval stack. The vitrivr stack is the continuation of the IMOTION system [3, 5, 7, 8] which participated in previous iterations of the Video Browser Showdown [1]. Despite offering some new functionality, the primary focus for this years participation lies in the simplification and generalization of the retrieval stack in order to make it easier to adapt, deploy, and use by both experts and laymen. The vitrivr stack is available in its entirety from https://vitrivr.org.

The remainder of this paper is structured as follows: Sect. 2 provides a brief overview of the overall system architecture and Sect. 3 summarizes all query types supported by vitrivr. Section 4 provides details on the functionalities introduced in the current version. In Sect. 5, we briefly outline our reasoning behind the open sourcing of vitrivr and Sect. 6 concludes.

2 Architectural Overview

The vitrivr stack – like its predecessor IMOTION – consists of three primary system components: the storage layer \(\textsf {ADAM}_{{pro}}\) [2], the retrieval engine Cineast [4], and a browser-based user interface. Additionally, a web server is used to serve static content such as videos and thumbnail images. Additional details on the architecture of the entire stack can be found in [6].

3 Interaction and Query Types

The vitrivr stack offers various ways in which queries can be specified. Basically, they can for the most part be grouped into two categories: visual and textual. The visual query modes include Query-by-Sketch and Query-by-Example as well as Relevance Feedback which are based on visual input such as user generated sketches of a scene or one or multiple previously retrieved scenes. These queries are performed based on data extracted directly from the video frames. The textual queries are based on information which can be extracted from the video content and represented as text, such as spoken language, text on screen, or the provided textual video meta data. For this we use the ASR data provided with the video data set as well as several object detectors to produce labels for the shots. OCR is applied in order to make text which might appear on screen searchable as well.

4 New Functionality

While the IMOTION system that has participated in previous instances of VBS has always been a specialized piece of software, purpose-built for the competition, the functionality we added to vitrivr in preparation for this iteration of VBS are such that they are also useful for other use cases of vitrivr.

4.1 New User Interface

The most salient difference to the IMOTION System of the previous year is the new user interface. While still browser-based, the latest iteration of the UI is based upon the Angular framework^{Footnote 1}. Its modular structure makes it easy to customize the entire UI or parts thereof to shift its focus from general purpose multimedia retrieval to, in this case, competitive video retrieval.

As in the past, the UI enables result streaming in order to be able to already present partial results to the user while the query is still being processed by the backend. It, however, achieves this no longer via AJAX requests but rather uses a WebSocket connection to Cineast. A REST-API is also available. The stack still includes a web server which provides the static content such as shot thumbnails and the videos themselves, but it is no longer required to act as a proxy between the browser and Cineast. The screenshot in Fig. 1 depicts the current version of the UI.

4.2 Approximate Retrieval

The underlying storage engine \(\textsf {ADAM}_{{pro}}\) [2] supports multiple index structures for efficient vector space retrieval. Many of these index structures achieve their high efficiency by approximating results rather than producing the true nearest neighbors of a query vector. In previous system iterations, we only made use of exact query results which lead to longer query times. In the current iteration of the system, the choice as to whether exact or approximate queries should be used can be made at query time. Hence, the user can sacrifice some accuracy to gain major speed-ups.

5 Open Source

The entire vitrivr stack [6] is published under the MIT license, the source code of all its components is available from their individual GitHub^{Footnote 2} repositories, additional documentation can be found on https://vitrivr.org. Being a general-purpose multimedia retrieval stack, vitrivr has many applications outside of competitive video retrieval as it also supports other domains such as Images, Audio, and 3D-models. With this flexible open source stack, we hope to offer the community the basis for future research in many areas and domains of multimedia retrieval.

6 Conclusions

With the competitive video retrieval version of vitrivr, we plan to continue the successful participations with we had with the IMOTION system in the past. It is our hope that by publishing the entire retrieval stack as open source software, we lower the entry hurdle for future participants and provide some acceleration for the testing of new ideas in the context of large-scale video retrieval.

Notes

1.
https://angular.io/.
2.
https://github.com/vitrivr.

References

Cobârzan, C., Schoeffmann, K., Bailer, W., Hürst, W., Blažek, A., Lokoč, J., Vrochidis, S., Barthel, K.W., Rossetto, L.: Interactive video search tools: a detailed analysis of the video browser showdown 2015. Multimedia Tools Appl. 76(4), 5539–5571 (2017)
Article Google Scholar
Giangreco, I., Schuldt, H.: ADAM\(_{pro}\): database support for big multimedia retrieval. Datenbank-Spektrum 16(1), 17–26 (2016)
Article Google Scholar
Rossetto, L., Giangreco, I., Heller, S., Tănase, C., Schuldt, H., Dupont, S., Seddati, O., Sezgin, M., Altıok, O.C., Sahillioğlu, Y.: IMOTION - searching for video sequences using multi-shot sketch. In: Tian, Q., Sebe, N., Qi, G.-J., Huet, B., Hong, R., Liu, X. (eds.) MMM 2016. LNCS, vol. 9517, pp. 377–382. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-27674-8_36
Chapter Google Scholar
Rossetto, L., Giangreco, I., Schuldt, H.: Cineast: a multi-feature sketch-based video retrieval engine. In: Proceedings of the 2014 IEEE International Symposium on Multimedia (ISM 2014), Taichung, Taiwan, pp. 18–23. IEEE Computer Society, December 2014
Google Scholar
Rossetto, L., Giangreco, I., Schuldt, H., Dupont, S., Seddati, O., Sezgin, M., Sahillioğlu, Y.: IMOTION — a content-based video retrieval engine. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds.) MMM 2015. LNCS, vol. 8936, pp. 255–260. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14442-9_24
Google Scholar
Rossetto, L., Giangreco, I., Tănase, C., Schuldt, H.: vitrivr: a flexible retrieval stack supporting multiple query modes for searching in multimedia collections. In: Proceedings of the 2016 ACM Conference on Multimedia Conference (ACM MM 2016), Amsterdam, The Netherlands, pp. 1183–1186. ACM, October 2016
Google Scholar
Rossetto, L., Giangreco, I., Tănase, C., Schuldt, H.: Multimodal video retrieval with the 2017 IMOTION system. In: Proceedings of the 2017 ACM International Conference on Multimedia Retrieval (ICMR 2017), Bucharest, Romania, pp. 457–460. ACM, June 2017
Google Scholar
Rossetto, L., Giangreco, I., Tănase, C., Schuldt, H., Dupont, S., Seddati, O.: Enhanced retrieval and browsing in the IMOTION system. In: Amsaleg, L., Guðmundsson, G.Þ., Gurrin, C., Jónsson, B.Þ., Satoh, S. (eds.) MMM 2017. LNCS, vol. 10133, pp. 469–474. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51814-5_43
Chapter Google Scholar

Download references

Acknowledgements

This work was partly supported by the Chist-Era project IMOTION with contributions from the Swiss National Science Foundation (SNSF, contract no. 20CH21_151571).

Author information

Authors and Affiliations

Databases and Information Systems Research Group, Department of Mathematics and Computer Science, University of Basel, Basel, Switzerland
Luca Rossetto, Ivan Giangreco, Ralph Gasser & Heiko Schuldt

Authors

Luca Rossetto
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Giangreco
View author publications
You can also search for this author in PubMed Google Scholar
Ralph Gasser
View author publications
You can also search for this author in PubMed Google Scholar
Heiko Schuldt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luca Rossetto .

Editor information

Editors and Affiliations

Alpen-Adria-Universität Klagenfurt, Klagenfurt, Austria
Klaus Schoeffmann
Chulalongkorn University, Bangkok, Thailand
Thanarat H. Chalidabhongse
City University of Hong Kong, Hong Kong, China
Chong Wah Ngo
Chulalongkorn University, Bangkok, Thailand
Supavadee Aramvith
Dublin City University, Dublin, Ireland
Noel E. O’Connor
Gwangju Institute of Science and Technology, Gwangju, Korea (Republic of)
Yo-Sung Ho
Tampere University of Technology, Tampere, Finland
Moncef Gabbouj
Rutgers University, Piscataway, New Jersey, USA
Ahmed Elgammal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rossetto, L., Giangreco, I., Gasser, R., Schuldt, H. (2018). Competitive Video Retrieval with vitrivr. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10705. Springer, Cham. https://doi.org/10.1007/978-3-319-73600-6_41

Download citation

DOI: https://doi.org/10.1007/978-3-319-73600-6_41
Published: 13 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73599-3
Online ISBN: 978-3-319-73600-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Competitive Video Retrieval with vitrivr

Abstract

Similar content being viewed by others