Multi-Model Stream Processing Benchmark (Bachelor Thesis, Ongoing)
Author
Description
The exponential growth of real-time data generated by various sources, such as social media, sensors, and Internet of Things devices, has led to the emergence of stream processing as a critical area in data management. Stream processing frameworks enable the continuous processing and analysis of data in motion, allowing for immediate insights. Unlike traditional data management systems, stream processing requires continuous ingestion, processing, and analysis of data as it flows through the system. This paradigm shift introduces significant complexity and demands specialized knowledge far beyond that required for conventional data management.
One of the primary challenges in stream processing lies in managing heterogeneous workloads that vary in data structure, volume, velocity, and complexity.
While for such systems the focus mostly lies in the handling of various workload frequencies, the aspect of high heterogeneity and the missing data model guarantees is often only an afterthought. As a consequence they mostly rely on the developers to provide handling for those guaranties, which leads to highly complex and specialized systems.
This thesis aims to observe these mentioned challenges by conducting a comprehensive analysis of existing stream processing benchmarks and approaches, with a specific focus on their ability to observe heterogeneous workloads. As a result of this analysis, a custom stream processing benchmark will be conceptualized and developed. This benchmark will simulate diverse real-world scenarios and evaluate key performance metrics such as latency, throughput, fault tolerance, scalability and handling of various degrees of workload heterogeneity.
The complexity of stream processing will be further explored through the implementation and testing of this benchmark on a widely-used stream processing framework. This will provide valuable insights into how well the framework performs under various workload conditions, as well as various degrees of changing workloads.
Start / End Dates
2024/09/11 - 2025/01/10