Hybrid Data Storage and Access for Big Data Collections (Master Thesis, Finished)


Marco Vogt


In the last decade, since Michael Stonebraker has argued that the "One Size Fits All" approach of data stores has failed, many special-purpose data stores have been developed. While these stores are typically well-suited for only one type of workload, there is the need for a system that unifies their advantages and is thereby well-suited for diverse workloads.

We propose the idea of a physical and logical hybrid database system for big data applications that combines different specialized data stores running on different storage systems depending on the workload.

In this thesis we introduce Icarus, an adaptive multi-store database system which uses multiple storage engines simultaneously. The execution time of a query is improved by automatically routing it to the data store which has the best characteristics for executing this kind of query. The underlying routing table is autonomously learned by continuously analyzing the query execution times.

A major challenge by building a multi-store database system is the lack of a universal query language supported by all data stores. We introduce PolySQL, the query language of Icarus, which is used as base for deriving the data stores’ specific query statements.

We propose the Hammer benchmark along with a reference implementation, which is used to evaluate Icarus. The Hammer benchmark models a realistic application for Polystores, combining different types of queries.

The results of the evaluation show that Icarus significantly improves the overall execution time compared to its underlying data stores. With this success, we are a huge step further in realizing our vision of an adaptive and distributed Polystore.

Start / End Dates

2016/10/06 - 2017/04/05


Research Topics