Result Robustness in Multimedia Retrieval Evaluations (Master Project, Finished)


Vera Benz


In multimedia retrieval, many different systems, algorithms and methods are developed with the goal of finding the correct items as quickly as possible. In order to compare them, they must be evaluated. Recent analyses of these evaluations have shown that the used data is often not sufficient for robust results.

To prevent this, the robustness of evaluations can be analyzed. An evaluation is robust if the results remain similar despite the addition of more data points.

In this thesis, we use two significance tests: the two-tailed pair-wise sign test to analyze the difference between two methods, and Kendall’s Tau rank correlation to analyze the correlation of two ranked lists. In addition, we create a set of 161 queries that differ in wording and the searched item so that it can be used for a robust evaluation of retrieval systems.

Finally, we use the created set of queries to evaluate temporal scoring algorithms for the retrieval system vitrivr and use the two significance tests to perform a robustness analysis. We compare two metrics: the item position, which corresponds to the position of the searched item, and its reciprocal rank.

Analyzed are the differences between the various algorithms and the correlation of their rankings and differences by using different method parameters. For the analyzed algorithms and methods we can conclude, that the evaluation of vitrivr is robust, thus a robust analysis of the evaluation results is possible and the two metrics provide similar results.

In summary, we created a set of queries that can be used to evaluate different retrieval systems and we enabled the analysis of evaluations of retrieval systems in terms of their robustness.

Start / End Dates

2022/08/22 - 2022/10/07


Research Topics