Abstract
This paper presents the competitive video retrieval capabilities of vitrivr. The vitrivr stack is the continuation of the IMOTION system which participated to the Video Browser Showdown competitions since 2015. The primary focus of vitrivr and its participation in this competition is to simplify and generalize the system’s individual components, making them easier to deploy and use. The entire vitrivr stack is made available as open source software.
Access provided by CONRICYT-eBooks. Download conference paper PDF
Similar content being viewed by others
Keywords
- Video Browser Showdown
- Large-scale Video Retrieval
- Exact Query Results
- Primary System Components
- Angular Framework
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
1 Introduction
In this paper we present the current iteration of vitrivr [6], an open-source content-based multimedia retrieval stack. The vitrivr stack is the continuation of the IMOTION system [3, 5, 7, 8] which participated in previous iterations of the Video Browser Showdown [1]. Despite offering some new functionality, the primary focus for this years participation lies in the simplification and generalization of the retrieval stack in order to make it easier to adapt, deploy, and use by both experts and laymen. The vitrivr stack is available in its entirety from https://vitrivr.org.
The remainder of this paper is structured as follows: Sect. 2 provides a brief overview of the overall system architecture and Sect. 3 summarizes all query types supported by vitrivr. Section 4 provides details on the functionalities introduced in the current version. In Sect. 5, we briefly outline our reasoning behind the open sourcing of vitrivr and Sect. 6 concludes.
2 Architectural Overview
The vitrivr stack – like its predecessor IMOTION – consists of three primary system components: the storage layer \(\textsf {ADAM}_{{pro}}\) [2], the retrieval engine Cineast [4], and a browser-based user interface. Additionally, a web server is used to serve static content such as videos and thumbnail images. Additional details on the architecture of the entire stack can be found in [6].
3 Interaction and Query Types
The vitrivr stack offers various ways in which queries can be specified. Basically, they can for the most part be grouped into two categories: visual and textual. The visual query modes include Query-by-Sketch and Query-by-Example as well as Relevance Feedback which are based on visual input such as user generated sketches of a scene or one or multiple previously retrieved scenes. These queries are performed based on data extracted directly from the video frames. The textual queries are based on information which can be extracted from the video content and represented as text, such as spoken language, text on screen, or the provided textual video meta data. For this we use the ASR data provided with the video data set as well as several object detectors to produce labels for the shots. OCR is applied in order to make text which might appear on screen searchable as well.
4 New Functionality
While the IMOTION system that has participated in previous instances of VBS has always been a specialized piece of software, purpose-built for the competition, the functionality we added to vitrivr in preparation for this iteration of VBS are such that they are also useful for other use cases of vitrivr.
4.1 New User Interface
The most salient difference to the IMOTION System of the previous year is the new user interface. While still browser-based, the latest iteration of the UI is based upon the Angular frameworkFootnote 1. Its modular structure makes it easy to customize the entire UI or parts thereof to shift its focus from general purpose multimedia retrieval to, in this case, competitive video retrieval.
As in the past, the UI enables result streaming in order to be able to already present partial results to the user while the query is still being processed by the backend. It, however, achieves this no longer via AJAX requests but rather uses a WebSocket connection to Cineast. A REST-API is also available. The stack still includes a web server which provides the static content such as shot thumbnails and the videos themselves, but it is no longer required to act as a proxy between the browser and Cineast. The screenshot in Fig. 1 depicts the current version of the UI.
4.2 Approximate Retrieval
The underlying storage engine \(\textsf {ADAM}_{{pro}}\) [2] supports multiple index structures for efficient vector space retrieval. Many of these index structures achieve their high efficiency by approximating results rather than producing the true nearest neighbors of a query vector. In previous system iterations, we only made use of exact query results which lead to longer query times. In the current iteration of the system, the choice as to whether exact or approximate queries should be used can be made at query time. Hence, the user can sacrifice some accuracy to gain major speed-ups.
5 Open Source
The entire vitrivr stack [6] is published under the MIT license, the source code of all its components is available from their individual GitHubFootnote 2 repositories, additional documentation can be found on https://vitrivr.org. Being a general-purpose multimedia retrieval stack, vitrivr has many applications outside of competitive video retrieval as it also supports other domains such as Images, Audio, and 3D-models. With this flexible open source stack, we hope to offer the community the basis for future research in many areas and domains of multimedia retrieval.
6 Conclusions
With the competitive video retrieval version of vitrivr, we plan to continue the successful participations with we had with the IMOTION system in the past. It is our hope that by publishing the entire retrieval stack as open source software, we lower the entry hurdle for future participants and provide some acceleration for the testing of new ideas in the context of large-scale video retrieval.
Notes
References
Cobârzan, C., Schoeffmann, K., Bailer, W., Hürst, W., Blažek, A., Lokoč, J., Vrochidis, S., Barthel, K.W., Rossetto, L.: Interactive video search tools: a detailed analysis of the video browser showdown 2015. Multimedia Tools Appl. 76(4), 5539–5571 (2017)
Giangreco, I., Schuldt, H.: ADAM\(_{pro}\): database support for big multimedia retrieval. Datenbank-Spektrum 16(1), 17–26 (2016)
Rossetto, L., Giangreco, I., Heller, S., Tănase, C., Schuldt, H., Dupont, S., Seddati, O., Sezgin, M., Altıok, O.C., Sahillioğlu, Y.: IMOTION - searching for video sequences using multi-shot sketch. In: Tian, Q., Sebe, N., Qi, G.-J., Huet, B., Hong, R., Liu, X. (eds.) MMM 2016. LNCS, vol. 9517, pp. 377–382. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-27674-8_36
Rossetto, L., Giangreco, I., Schuldt, H.: Cineast: a multi-feature sketch-based video retrieval engine. In: Proceedings of the 2014 IEEE International Symposium on Multimedia (ISM 2014), Taichung, Taiwan, pp. 18–23. IEEE Computer Society, December 2014
Rossetto, L., Giangreco, I., Schuldt, H., Dupont, S., Seddati, O., Sezgin, M., Sahillioğlu, Y.: IMOTION — a content-based video retrieval engine. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds.) MMM 2015. LNCS, vol. 8936, pp. 255–260. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14442-9_24
Rossetto, L., Giangreco, I., Tănase, C., Schuldt, H.: vitrivr: a flexible retrieval stack supporting multiple query modes for searching in multimedia collections. In: Proceedings of the 2016 ACM Conference on Multimedia Conference (ACM MM 2016), Amsterdam, The Netherlands, pp. 1183–1186. ACM, October 2016
Rossetto, L., Giangreco, I., Tănase, C., Schuldt, H.: Multimodal video retrieval with the 2017 IMOTION system. In: Proceedings of the 2017 ACM International Conference on Multimedia Retrieval (ICMR 2017), Bucharest, Romania, pp. 457–460. ACM, June 2017
Rossetto, L., Giangreco, I., Tănase, C., Schuldt, H., Dupont, S., Seddati, O.: Enhanced retrieval and browsing in the IMOTION system. In: Amsaleg, L., Guðmundsson, G.Þ., Gurrin, C., Jónsson, B.Þ., Satoh, S. (eds.) MMM 2017. LNCS, vol. 10133, pp. 469–474. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51814-5_43
Acknowledgements
This work was partly supported by the Chist-Era project IMOTION with contributions from the Swiss National Science Foundation (SNSF, contract no. 20CH21_151571).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Rossetto, L., Giangreco, I., Gasser, R., Schuldt, H. (2018). Competitive Video Retrieval with vitrivr. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10705. Springer, Cham. https://doi.org/10.1007/978-3-319-73600-6_41
Download citation
DOI: https://doi.org/10.1007/978-3-319-73600-6_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73599-3
Online ISBN: 978-3-319-73600-6
eBook Packages: Computer ScienceComputer Science (R0)