Extending vitrivr-engine with Emotion-Based Retrieval and a Modular User Interface

Geller, Andrina; Arnold, Rahel; Waltenspül, Raphael; Schuldt, Heiko

Extending vitrivr-engine with Emotion-Based Retrieval and a Modular User Interface

Authors

Andrina Geller, Rahel Arnold, Raphael Waltenspül, Heiko Schuldt

Type

In Proceedings

Date

2026/1

Appears in

MultiMedia Modeling: 32nd International Conference on Multimedia Modeling, MMM 2026, Prague, Czech Republic, January 29–31, 2026, Proceedings, Part IV

Location

Prague, Czech Republic

Abstract

Interactive video retrieval has traditionally focused on visual, textual, and audio cues. Thereby, the emotions contained within this multimedia content were mostly overlooked for retrieval. In this work, we introduce a new version of the vitrivr-engine that incorporates emotion-based retrieval as a novel modality. It extends the established approaches such as visual concept detection, optical character recognition (OCR), and automatic speech recognition (ASR). To achieve this, we integrate deep learning models for facial expression analysis, text-based sentiment classification, and speech emotion recognition, each contributing to a unified representation of affective characteristics in video data. In addition to this new retrieval modality, we present vitrivr-web, a newly developed modular frontend built in React, designed to offer an adaptable and intuitive user experience through the modular structure of the vitrivr-engine. Furthermore, the backend features a new API, simplifying the use of the vitrivr engine and ensuring consistency across all functionalities. Together, these new features aim to expand the scope of interactive video retrieval, improving both usability and alignment with human memory processes.

Staff members

Research Projects

vitrivr