Temporal Multimodal Video and Lifelog Retrieval (PhD Thesis, ongoing)

The past decades have seen an exponential growth of both consumption and production of data, with multimedia data such as images and videos contributing significantly to this growth. The widespread proliferation of smartphones has provided everyday users with the ability to not only consume, but also produce multimedia content easily and in volume. 

As the complexity and diversity of multimedia data has grown, so has the need for more complex retrieval models. Conventional retrieval models have often focused on queries targeting small units of retrieval, however in domains such as video or lifelog retrieval, users are usually looking for longer sequences and remember temporal context of a desired item.

Recent benchmarking campaigns of retrieval systems have shown even simple retrieval models which enable such queries to be very successful. However, there is little research into such holistic retrieval models and their evaluation.


In this thesis, we aim to close this research gap by making several contributions to the fields of content-based video and lifelog retrieval. We present a retrieval model for complex information needs with temporal components including a data and query model for multimedia retrieval and a modular and adaptable query execution model which includes novel algorithms for result fusion and is adaptable to future research developments.


The concepts and models are implemented in vitrivr, an open-source multi-modal multimedia retrieval system which has proven its competitiveness in evaluation campaigns, has participated at Google Summer of Code, and is now used in multiple large-scale interdisciplinary research projects.


We demonstrate the usefulness and effectiveness of our contributions in two ways: Firstly, by showing results from user-centric evaluations which pit different user-system combinations against one another. Secondly, we perform a system-centric evaluation by creating a new dataset for temporal information needs in video and lifelog retrieval and quantitatively evaluate our contributions.


Participation at interactive retrieval evaluation campaigns over multiple years provides insight into possible future developments and challenges of such campaigns. Our results also show that there are significant benefits for systems which enable users to specify more complex information needs with temporal components.



Start Date


Funding Agencies

Partially funded by the Swiss National Science Foundation.

Research Topics