Temporal Analysis and Validation of Evaluations (Bachelor Thesis, Ongoing)

Author

Patrik Bütler

Description

Evaluations are an important part in the research and development of systems. Validating the correct execution of benchmarks and comparing the result of these benchmarks are a crucial part for drawing conclusions from the result.

The open-source evaluation system Chronos is a system for automating the entire benchmarking process. By reducing the need for human innervation, it also increases the reproducibility and thus validity of the results.

The aim of this project is to extend the Chronos system by adding a component that analyzes the log records submitted by the systems under evaluation. This component should identify patterns in the log record that indicate a correct execution, and also such patterns that indicate errors while executing the benchmark. The result of this analysis should be displayed in the UI. Furthermore, the system should also be extended to no only analyze the result of one run of a benchmark but also compare the results of multiple valid executions of a benchmark.

In more detail, the objectives of this project are:

 

If time allows, the project can be extended to add further enhancements to the Chronos stack that foster the reproducibility of the evaluations and the usability of the systems, for instance, improving the scheduling of jobs or adding a better separation between benchmarks and systems under evaluation.

Start / End Dates

2024/04/17 - 2024/08/16

Supervisors

Research Topics