Simulation Suite for Competitive Interactive Multimedia Retrieval Evaluation (Master Project, Finished)
DRES - distributed retrieval evaluation server - is a tool to design and orchestrate multimedia retrieval evaluations such as installments of the Video Browser Showdown (VBS) or Lifelog Search Challenge (LSC).
Within these evaluation campaigns, retrieval systems run a finite number of tasks in a competition setting. The DRES system provides OpenAPI defintions for submitting evaluation task answers as well as interaction and result logs.
Since organising evaluation events such as VBS or LSC is linked to a large effort and requires a lot of human power to be set-up and executed, having a simulation suite that is capable of simulating behaviour of such systems is highly beneficial. The work in this project can be built on the findings of recent research.
The main goal of this project is to experimentally come up with a simulation suite that simulates multimedia retrieval system and operators for interactive multimedia retrieval evaluations using DRES. In addition, the simulation suite should be able to be configured in such a way that the DRES system can be stress tested.
Multimedia retrieval is a field of computer science that focuses on developing algorithms and systems to quickly find elements in a collection. Annual interactive competitions are organised to compare such retrieval systems. These competitions use evaluation servers, such as DRES, to allow simultaneous assessments. To avoid the need for many human operators to test these systems, simulations of such competitions are essential. In this project, we develop a simulation suite for multimedia retrieval competitions that consists of two components: the simulation engine, which can simulate an entire competition with participants, and the stress test, which can be used to test the limits of the evaluation server. To simulate the behaviour of the participants during the competition as realistically as possible, we analysed the behaviour of the participants during the Video Browser Showdown (VBS) competition. We identified four behavioural parameters: the probability of solving a task, the probability of submitting a correct item, the first item search time and the search time between items. Based on the analysis of VBS 2022 and VBS 2023, we can identify three types of submitters, good, average and bad, which can be used for simulation. By conducting a simulated VBS competition with the evaluation server DRES, we were able to determine that the behaviour of the simulated participants is similar to the behaviour of the real participants. By performing the stress test, we found that the memory usage is depending on the task type and does not affect the server access time.
Start / End Dates
2023/02/18 - 2023/05/01