Extending vitrivr-engine with Emotion-Based Retrieval and a Modular User Interface
Interactive video retrieval has traditionally focused on visual, textual, and audio cues. Thereby, the emotions contained within this multimedia content were mostly overlooked for retrieval. In this work, we introduce a new version of the vitrivr-engine that incorporates emotion-based retrieval as a novel modality. It extends the established approaches such as visual concept detection, optical character recognition (OCR), and automatic speech recognition (ASR). To achieve this, we integrate deep learning models for facial expression analysis, text-based sentiment classification, and speech emotion recognition, each contributing to a unified representation of affective characteristics in video data. In addition to this new retrieval modality, we present vitrivr-web, a newly developed modular frontend built in React, designed to offer an adaptable and intuitive user experience through the modular structure of the vitrivr-engine. Furthermore, the backend features a new API, simplifying the use of the vitrivr engine and ensuring consistency across all functionalities. Together, these new features aim to expand the scope of interactive video retrieval, improving both usability and alignment with human memory processes.