General Purpose Multimedia Retrieval with vitrivr at LSC’24
The collection of lifelog data --- visual and multi-sensory data, including biometric and spatiotemporal metadata --- becomes easier and more supported by commercial products every year. Naturally, lifelog data is multi-modal, with arguably a major audio-visual component, such as captured videos, audio recordings and photos. For lifelog retrieval, the challenges of managing and accessing (visual) multimedia content are paired with the challenges of semi-structured and heterogeneous metadata. One approach to these challenges is the application of general-purpose, content-based multimedia retrieval in combination with traditional Boolean retrieval. In this paper, we present the latest iteration of vitrivr, a long-running participant in the Lifelog Search Challenge. After successfully replacing the retrieval engine Cineast with the vitrivr-engine for the structurally related Video Browser Showdown, we adjust the general purpose, content-based multimedia retrieval system to lifelog retrieval by extending the modular retrieval engine with Boolean retrieval and a model for metadata. In doing so, we continue to generalize the retrieval aspects also suitable for other applications and evaluate our system at the Lifelog Search Challenge 2024.