Towards Archiving-as-a-Service:A Distributed Index for the Cost-effective Access to Replicated Multi-Version Data

Authors
Filip-M. Brinkmann and Heiko Schuldt
Type
In Proceedings
Date
2015/7
Appears in
Proceedings of the 19th International Database Engineering and Applications Symposium (IDEAS '15)
Location
Yokohama, Japan
Abstract
With the advent of data Clouds that come with nearly unlimited storage capacity combined with low storage costs, the well-established update-in-place paradigm for data management is more and more replaced by a multi-version approach. Especially in a Cloud environment with several geographically distributed data centers that act as replica sites, this allows to keep old versions of data and thus to provide a rich set of read operations with different semantics (e.g., read most recent version, read version not older than, read data as of, etc.). A combination of multi-version data management, replication, and partitioning allows to redundantly store several or even all versions of data items without significantly impacting each single site. However, in order to avoid that single sites in such partially replicated data Clouds are overloaded when processing archive queries that access old versions, query optimization has to jointly consider version selection and load balancing (site selection). In this paper, we introduce ARCTIC, a novel cost-aware index for version and site selection for a broad range of query types including both fresh data and archive data. We describe in detail the interplay between the different parts of the index and their implementation. Moreover, we present the results of the evaluation of the combined version and replica index in a Cloud environment that shows a significant gain in query throughput compared to a monolithic index.