ClouDMan: Cost-based Data Management in Cloud Environments (Finished)

During the last years, Clouds have increasingly become very attractive environments for deploying different types of applications. The main reason for this popularity is the 'pay-as-you-go' cost model of the Cloud, combined with its almost unlimited scalability and high availability. From the perspective of organizations or companies using the Cloud, the pay-as-you go cost model allows to only pay for the resources actually used. Traditional problems of over-provisioning (i.e., when the IT resources -usually complete compute centers- were designed for a much higher expected load than what was actually faced, which led to additional, unnecessary costs for the organization/company) or under-provisioning (i.e., when due to the lack of IT resources customers had to be turned away) is fortunately belonging to the past. Cloud environments are highly elastic which means that they provide a vast amount of resources that can be used by Cloud customers on very short notice, thus guaranteeing that the underlying IT environment adapts and dynamically scales to the actual needs. Elastic behavior, almost unlimited scalability, and in particular high availability has strong consequences for data management in the Cloud. A high degree of availability is provided by geographically replicating data inside a Cloud, i.e., by using resources at different sites of a Cloud provider. This, in turn, necessitates distributed transactions to guarantee data to be consistent.

While distributed transaction management and replication management have been subject to intensive research in the past decades, the Cloud comes with a new dimension that necessitates to reconsider and rethink current approaches, algorithms and protocols: the cost dimension. As a consequence of the pay-as-you go cost model of the Cloud, each resource and its usage comes with a price tag, usually at a very fine-grained level. Users of the Cloud have to pay, for instance, for each megabyte of storage used, for each CPU cycle, for incoming and outgoing megabytes of data traffic, and even for each message placed in a queue hosted by a Cloud provider. Even worse, these prices  not only differ between Cloud providers, they may also (significantly) differ between different data centers of the same Cloud provider.

Hence, the  consideration of i.) data consistency, ii.) performance, and iii.) cost opens new areas for research in distributed data management and new possibilities for optimizing existing protocols.

The objective of the ClouDMan project is to investigate new approaches to Cost-based Data Partitioning and to Policy-based Data Management. The former aspect, Cost-based Data Partitioning, takes into  account that different sites of a Cloud provider come with different pricing schemes. Therefore, optimizing replicated data management with regard to consistency, performance, and cost needs to seamlessly consider data placement, in addition to the number of replicas and the protocol for propagating updates to replicated data. The second aspect, Policy-based Data Management, takes into account that many applications come with dedicated requirements and restrictions on data placement, performance, cost, or consistency such as `data may not be stored outside the country of its origin', `data management has to be provided as cheaply as possible' and/or `1-copy serializability has to be provided'. The goal is to automatically select the best suited protocol for meeting the requirements and constraints for replicated data management in the Cloud, on the basis of the specified policies.

Start / End Dates

01.11.2013 - 30.04.2015

Funding Agencies

Swiss National Science Foundation (SNF)

Funding

86'080.- CHF

Staff

Research Topics

Publications

2017
2016
2015
2014