A Client for Dynamic Replication and Partitioning of Big Data (Master Project, Finished)
Silvan Heller, Manuel Hürbin
The goal of this Master Project is the development of a client for a DBMS, which dynamically replicates and partitions big data. The client should be able to stress the DBMS with different benchmarks and scenarios. To do so, the client should, first, identify how many client instances are needed, second, set DBMS and benchmark specific parameters at the DBMS, third, run the benchmark including the measurement of certain runtime values and, finally, process the collected measurements and upload the results.
On the Software
- It should be possible to run/start the client as Chronos-Agent (using the appropriate library).
- The client should integrate different benchmarks, for example, various TPC benchmarks (e.g., TPC-C and TPC-H) and YCSB. Additionally, it should be possible to easily integrate custom benchmarks or "scenarios" (e.g., Hammer).
- The starting parameters can be set via command line, Chronos or Properties files.
- There should be a parameter how the client accesses the DBMS, at least via JDBC and HTTP/REST.
- The client should measure different times (e.g., execution time of a statement) and monitor its progress. Perhaps, there are different "measurement modules" per access interface.
- The client should also collect metrics (e.g., CPU load and memory usage) from a list of hosts and save them in such a manner that allows a rough mapping to the executed queries (e.g., by using timestamps). The Chronos Job should allow the specification of the metrics which should be stored and which machines should be monitored. For gathering the metrics a tool like netdata (https://github.com/firehol/netdata) should be used.
- The architecture of the client should follow a Master-Slave-Architecture to allow benchmark jobs utilizing multiple client machines.
On the Project
- Collection of relevant parameters (e.g., access protocol, parameters of the benchmarks/scenarios, the Concurrency Control Protocol the DBMS should use and the workload).
- Collection of runtime values the client will measure (e.g., throughput and time consumption of the locking phase).
- Description of the software architecture.
- Milestone presentation presenting the project plan, the distribution of topics and responsibilities along with the proposed architecture, benchmarks, dimensions and parameters.
- Final presentation on the results/outcome of the project.
- Creation of plots using the gathered measurements
- Visualization of the system, client, progress, etc.
- Final project report
- Presentation slides
- Source code
The project is designed as group activity for two students. In a first phase of the project, the group should work out the details of a project plan, the distribution of topics and responsibilities within the group.
Start / End Dates
2017/02/15 - 2017/07/31