Temporal Multimodal Video and Lifelog Retrieval

Authors
Silvan Heller
Type
PhD Thesis
Date
2023/3
Appears in
PhD Thesis, Department of Mathematics and Computer Science
Location
University of Basel, Switzerland
Abstract

The past decades have seen exponential growth of both consumption and production of data, with multimedia such as images and videos contributing significantly to said growth. The widespread proliferation of smartphones has provided everyday users with the ability to consume and produce such content easily. As the complexity and diversity of multimedia data has grown, so has the need for more complex retrieval models which address the information needs of users. Finding relevant multimedia content is central in many scenarios, from internet search engines and medical retrieval to querying one’s personal multimedia archive, also called lifelog. Traditional retrieval models have often focused on queries targeting small units of retrieval, yet users usually remember temporal context and expect results to include this. However, there is little research into enabling these information needs in interactive multimedia retrieval. In this thesis, we aim to close this research gap by making several contributions to multimedia retrieval with a focus on two scenarios, namely video and lifelog retrieval. We provide a retrieval model for complex information needs with temporal components, including a data model for multimedia retrieval, a query model for complex information needs, and a modular and adaptable query execution model which includes novel algorithms for result fusion. The concepts and models are implemented in vitrivr, an open-source multimodal multimedia retrieval system, which covers all aspects from extraction to query formulation and browsing. vitrivr has proven its usefulness in evaluation campaigns and is now used in two large-scale interdisciplinary research projects. We show the feasibility and effectiveness of our contributions in two ways: firstly, through results from user-centric evaluations which pit different user-system combinations against one another. Secondly, we perform a system-centric evaluation by creating a new dataset for temporal information needs in video and lifelog retrieval with which we quantitatively evaluate our models. The results show significant benefits for systems that enable users to specify more complex information needs with temporal components. Participation in interactive retrieval evaluation campaigns over multiple years provides insight into possible future developments and challenges of such campaigns.

Staff members
Research Projects