Cost-based flexible Correctness in Data Grid and Data Cloud Infrastructures (Master Thesis, Finished)


Ilir Fetai


Cloud computing is becoming the preferred platform for deploying scalable web-applications. The goal of cloud computing is to provide services: low level (CPU, Storage) and high level (authentication services, queues, or payment services) at low cost. It promises infinite scalability and high availability. It is the responsibility of the cloud provider to guarantee that data is highly available and that the infrastructure will scale in order to handle heavy loads. This releases the clients from the burden of managing the infrastructure, so that they can concentrate on their core business. Specialized knowledge is required to set-up large scale systems and address issues like availability, security, scalability, etc. Cloud computing has the advantage of amortization of expense across a large number of customers for hiring people with deep expertise levels for managing the systems. The success of cloud computing is based on economy of scale. The business model is pay-per-use. That means users can avoid investments and pay only what they consume. Data management forms a crucial part of the application deployed in the cloud. As load increases, the application and web servers can be easily scaled out by adding new instances. However, by replicating only the servers high availability of the data cannot be guaranteed and from performance point of view usually the database becomes the bottleneck. Scalable and highly available data management systems form a crucial part of the cloud infrastructure. However, traditional database have many limitations when it comes to horizontal scalability. As a result, new type of data stores, called Key-Value stores has been introduced as mean of providing high availability and scalability. Key-Value stores – such as Bigtable, PNUTS, Dynamo are the preferred data stores for applications in the cloud. High availability and scalability is achieved through replication of the data. However, Key-Value stores provide only relaxed consistency guarantees. This may be sufficient for some type of applications, other applications require stronger consistency levels, like One-Copy-Serializability (1SR), which have to be implemented on top of existing cloud services (queue services, lock services). On the other side, providing 1SR is costly in the cloud and reduces strongly concurrency. In this master project we will focus on two aspects. First, we will define an economical model for the management of transactions over replicated data in the Cloud. Second, we will provide a simple API for defining the desired consistency at transaction level, and the costs the users are willing to pay in order to enforce the specified consistency level. This makes it possible to satisfy the different consistency requirements of different applications and also keep the costs under control.

Start / End Dates

2010/10/20 - 2011/04/19


Research Topics