Free-form Multi-Modal Multimedia Retrieval (4MR)

Arnold, Rahel; Sauter, Loris; Schuldt, Heiko

Authors

Rahel Arnold, Loris Sauter, Heiko Schuldt

Type

In Proceedings

Date

2023/1

Appears in

Proceedings of the 29th International Conference on Multimedia Modeling (MMM 2023)

Location

Bergen, Norway

Abstract

Due to the ever increasing amount of multimedia data, efficient means for multimedia management and retrieval are required. Especially with the rise of deep-learning-based analytics methods, the semantic gap has shrunk considerably, but a human in the loop is still considered mandatory. One of the driving factors of video search is that humans tend to refine their queries after reviewing the results. Hence, the entire process is highly interactive. A natural approach to interactive video search is using textual descriptions of the content of the expected result, enabled by deep learning-based joint visual text co-embedding. In this paper, we present the Multi-Modal Multimedia Retrieval (4MR) system, a novel system inspired by vitrivr, that empowers users with almost entirely free-form query formulation methods. The top-ranked teams of the last few iterations of the Video Browser Showdown have shown that CLIP provides an ideal feature extraction method. Therefore, while 4MR is capable of image and text retrieval as well, for VBS video retrieval is based primarily based on CLIP.

Comments

VBS 2023

This version of the contribution has been accepted for publication, after peer review (when applicable) but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://dx.doi.org/10.1007/978-3-031-27077-2. Use of this Accepted Version is subject to the publisher’s Accepted Manuscript terms of use https://www.springernature.com/gp/open-research/policies/accepted-manuscript-term.

Download

https://dbis.dmi.unibas.ch/publications/2023/VBS23-4MR/paper.pdf

Staff members

Research Projects

vitrivr