Cost-Based Data Replication in the Cloud (Master Thesis, Finished)


Daniel Kohler


Data replication is a mechanism to achieve increased data availability and scalability. There are different replication strategies which try to provide the user the desired availability and scalability guarantees. However, these properties are costly in the Cloud, i.e., the higher the desired availability guarantees the higher the costs. Example: In order to guarantee a high data availability the replicas should be placed as far as possible from each other, ideally in different countries, so that in case of disaster (e.g., natural disasters) in one country the data is still available. On the other hand, the higher the distance the higher the network costs, for instance for coordinating updates to these replicas. The goal of this master’s thesis is to develop a dynamic cost-based replication model. More concretely, this includes the following activities:

  1. Analyze the cost components of data replication
  2. Define a cost model to handle the trade-off between availability, scalability& costs
  3. Develop a dynamic replication model based on the cost model
  4. Develop a prototype to show the feasibility of the model


