Tapmaan: Temperature-aware Data Management in Polypheny-DB (Master Thesis, Finished)
Author
Description
Operational databases tend to continuously grow which often results in storing big amounts of only rarely accessed data (e.g., information on completed shipments of a parcel service). Since this data often can't be deleted, for instance, because of legal reasons, it slows down the whole database.
In the data management area it is common practice to describe the access frequency of a certain data set by a "temperature"'. Hot means that the data is currently very frequently accessed while cold means that it hasn't been accessed for a long time. In between there can be multiple nuances like warm.
Multi-Temperature data management refers to store hot data on fast, warm data on a slightly slower and cold data on slow, but also very cheap storage.
The goal of this project is to
- extend Polypheny-DB's storage layer to use the developed cost model to relocate data at runtime, and
- develop a cost model which classifies data entities with a temperature depending on certain parameters, for example, the access frequency, the storage cost, the read / write latency, etc.,
- evaluate the system with different benchmarks.
Because Polypheny-DB is a full-featured relational database management system, it has to be possible to also execute complex queries which, for instance, join cold and hot data.
The project includes the extension of the Gavel benchmark to simulate a scenario with massive amounts of data with different access characteristics.
Start / End Dates
2017/11/20 - 2018/05/19