ADAM — A Database and Information Retrieval System for Big Multimedia Collections
The past decade has seen the rapid proliferation of low-priced devices for recording image, audio and video data in nearly unlimited quantity. Multimedia is Big Data, not only in terms of their volume, but also with respect to their heterogeneous nature. This also includes the variety of the queries to be executed. Current approaches for searching in big multimedia collections mainly rely on keywords. However, manually annotating each single object in a large collection is not feasible. Therefore, content-based multimedia retrieval –using sample objects as query input– is increasingly becoming an important requirement for dealing with the data deluge. In image databases, for instance, effective methods exploit the use of exemplary images or hand-drawn sketches as query input. In this paper, we introduce ADAM, a novel multimedia retrieval system that is tailored to large collections and that is able to support both Boolean retrieval for structured data and similarity-based retrieval for feature vectors extracted from the multimedia objects. For efficient query processing in such big multimedia data, ADAM allows the distribution of the indexed collection to multiple shards and performs queries in a MapReduce style. Furthermore, it supports a signature-based indexing strategy for similarity search that heavily reduces the query time. The efficiency of ADAM has been successfully evaluated in a content-based image retrieval application on the basis of 14 million images from the ImageNet collection.