A Benchmark for RDF-based Metadata Management in Distributed Long-Term Digital Preservation

Ivan Subotic, Lukas Rosenthaler, Heiko Schuldt
In Proceedings
Appears in
Proceedings of the 3rd International Workshop on Data Engineering Meets the Semantic Web (DESWEB)
Washington D.C., USA
In a large variety of applications, the long-term, guaranteed availability of data is becoming increasingly important. Thus, long-term digital preservation systems have to be inherently distributed to allow content to be replicated. This affects both the preservation of the actual digital objects and their associated metadata. For the latter, RDF has become the prevalent data model. Ensuring data integrity and consistency requires periodic checks to timely detect inconsistencies, for instance due to (partial) hardware failures, and trigger repair actions. Hence, the access characteristics to metadata in longterm digital preservation significantly differs from metadata management in other types of applications. In addition, the increasing size of digital archives challenges the consistency checks of the associated metadata. In this paper, we introduce a novel benchmark for triple store-based metadata management that jointly takes into account the specific access patterns of long-term preservation systems:i.) complex periodic consistency checks, ii.) concurrent read  and write requests to the archive, and iii.) the actions to be taken on data to re-establish consistency if a violation has been detected. Furthermore, we present the results of this benchmark applied to our distributed long-term digital preservation system DISTARNET.