Analyzing the Performance of Data Replication and Data Partitioning in the Cloud: the Beowulf Approach

Authors
Alexander Stiemer, Ilir Fetai and Heiko Schuldt
Type
In Proceedings
Date
2016/12
Appears in
Proceedings of the 4th International Workshop on Scalable Cloud Data Management (SCDM 2016) - co-located with IEEE Big Data 2016
Location
Washington, D.C., USA
Publisher
IEEE Computer Society
Pages
2837 – 2846
Abstract

Applications deployed in the Cloud usually come with dedicated performance and availability requirements. This can be achieved by replicating data across several sites and/or by partitioning data. Data replication allows to parallelize read requests and thus to decrease data access latency, but induces significant overhead for the synchronization of updates. Partitioning, in contrast, is highly beneficial if all the data accessed by an application is located at the same site, but again necessitates coordination if distributed transactions are needed to serve applications. In this paper, we analyze three protocols for distributed data management in the Cloud, namely Read-One Write-All-Available (ROWAA), Majority Quorum (MQ) and Data Partitioning (DP) - all in a configuration that guarantees strong consistency. We introduce Beowulf, a meta protocol based on a comprehensive cost model that integrates the three protocols and that dynamically selects the protocol with the lowest latency for a given workload. In the evaluation, we compare the prediction of the Beowulf cost model with a baseline evaluation. The results nicely show the effectiveness of the analytical model and the precision in selecting the best suited protocol for a given workload.