Workload-Driven Adaptive Data Partitioning and Distribution – The Cumulus Approach

Ilir Fetai, Damian Murezzan and Heiko Schuldt
In Proceedings
Appears in
The Proceedings of the 2015 IEEE International Conference on Big Data (Big Data)
Santa Clara, CA, USA
IEEE Computer Society

Cloud environments usually feature several geographically distributed data centers. In order to increase the scalability of applications, many Cloud providers partition data and distribute these partitions across data centers to balance the load. However, if the partitions are not carefully chosen, it might lead to distributed transactions. This is particularly expensive when applications require strong consistency guarantees. The additional synchronization needed for atomic commitment would strongly impact transaction throughput and could even completely undo the gain that can be achieved by load balancing. Hence, it is beneficial to avoid distributed transactions as much as possible by partitioning the data in such a way that transactions can be executed locally. As access patterns of characteristic transaction workloads may change over time, the partitioning also needs to be dynamically updated.

In this paper we introduce Cumulus, an adaptive data partitioning approach which is able to identify characteristic access patterns of transaction mixes, to determine data partitions based on these patterns, and to dynamically re-partition data if the access patterns change. In the evaluation based on the TPC-C benchmark, we show that Cumulus significantly increases the overall system performance in an OLTP setting compared to static data partitioning approaches. Moreover, we show that Cumulus is able to adapt to workload shifts at runtime by generating partitions that match the actual workload and to re-configure the system on the fly.


DOI: 10.1109/BigData.2015.7363940

Workshop: 3rd International Workshop on Scalable Cloud Data Management (SCDM 2015)