Temporal Analysis and Validation of Evaluations (Bachelor Thesis, Finished)
Author
Description
Evaluations are an important part in the research and development of systems. Validating the correct execution of benchmarks and comparing the result of these benchmarks are a crucial part for drawing conclusions from the result.
The open-source evaluation system Chronos is a system for automating the entire benchmarking process. By reducing the need for human innervation, it also increases the reproducibility and thus validity of the results.
The aim of this project is to extend the Chronos system by adding a component that analyzes the log records submitted by the systems under evaluation. This component should identify patterns in the log record that indicate a correct execution, and also such patterns that indicate errors while executing the benchmark. The result of this analysis should be displayed in the UI. Furthermore, the system should also be extended to no only analyze the result of one run of a benchmark but also compare the results of multiple valid executions of a benchmark.
In more detail, the objectives of this project are:
- Implementing a component for analyzing the log records submitted by the systems under evaluations and identifying a definable set of patterns.
- Extend the UI to make the set of patterns user-definable for each benchmark.
- Display the result of the analysis in the UI.
- Add more graph types and improve the user interface where necessary.
- Add the ability to compare multiple runs of the same benchmark to identify trends.
If time allows, the project can be extended to add further enhancements to the Chronos stack that foster the reproducibility of the evaluations and the usability of the systems, for instance, improving the scheduling of jobs or adding a better separation between benchmarks and systems under evaluation.
Start / End Dates
2024/04/17 - 2024/08/16