Multimedia Information Retrieval

The past decade has seen the rapid proliferation of low-priced devices for recording image, audio, and video data in nearly unlimited quantity. Multimedia is Big Data, not only in terms of their volume, but also with respect to their heterogeneous nature. This creates new challenges for the management of multimedia collections and for searching within multimedia collections beyond well established keyword search based on manual annotations.


Work at the Databases and Information Systems Group in the research area Multimedia Information Retrieval addresses the following challenges, and is worked on in the context of prototype systems, the most prominent one being vitrivr:

  1. Sketch-based Retrieval: Most current approaches to image retrieval rely on either keywords (that will be evaluated against mainly manually provided metadata) or dedicated query objects (query-by-example). In our work, we consider query-by-sketch, an approach that allows users to specify (parts) of the image they search for on the basis of a color and/or edge sketch. Particular attention will be given to user interfaces and devices, partial matches, invariances in terms of rotation, displacement, scale, etc.
  2. Data management, metadata management, and indexing for large multimedia collections: Multimedia collections are usually very big in size and can, at the same time, be quite heterogeneous. This includes both the actual content but also the metadata which can be, depending on the features considered, highly sophisticated (i.e., high dimensional). In our work, which aims at seamlessly integrating concepts from database systems and from information retrieval systems, we support both structured and unstructured data and metadata, different retrieval modes, different indexing approaches and also different approaches to data distribution and replication to best support a broad range of multimedia queries. Our Cottontail DB system seamlessly combines several index structures and that provides so-called progressive queries, an approach which is tailored to similarity search in very large multimedia collections.
  3. Multi-modal multimedia retrieval: Users of multimedia systems increasingly aim at making use of different modalities when querying content. Most existing multimedia retrieval systems, however, are mono-modal systems or are limited to only very few modalities. In particular, they do not allow for the seamless combination of modalities, either at the same time or in a pipelined fashion, when processing queries. Our work aims at the development of a multimedia retrieval back-end that is able to jointly support a large variety of query types, e.g., keyword queries on manual annotations, query-by-example with given query objects, query-by-sketch capturing contents of objects in user-provided sketches, or any other type of query (e.g., spatio-temporal motion queries for videos, etc.) allowing for truly multi-modal user interaction. The will allow users to specify queries with several different query objects coming from different modalities, or to support interactive query processes in which users allow to refine queries by step-wise adding new query objects or changing the modalities and/or features to be considered. Collections to be considered will consist of curated content, crowdsourced content, and any combination thereof. The Cineast system is an innovative and functionally very rich multimedia retrieval engine that supports multi-modal user interaction.
  4. Open source multimedia retrieval system development: With the tremendous increase of video recording devices and the resulting abundance of digital video, finding a particular video sequence in ever-growing collections is more and more becoming a major challenge. Existing approaches to retrieve videos mostly still rely on text-based retrieval techniques to find desired sequences. With vitrivr, we have open sourced our Cineast retrieval engine and the ADAM database back-end in order to encourage a large and creative community of open source developers to actively participate in the development.
  5. The expoitation of multimedia analysis and similarity search for the (real-time) detection and analysis of digital disinformation on the web ("fake news detection"). While most approaches to fake news detection focus on text analysis, the verifir project addresses the analysis of multimedia content (images, videos) and also analyszes how digital disinformation communicated via images and/or video spreads via various social media channels.
  6. User interaction based on Augmented Reality / Virtual Reality: the presentation of search results can benefit from novel user interfaces and novel types of user experiences. In a mobile context, for instance when searching in historic multimedia collections in a touristic setting like in the City-Stories project, results can be presented using concepts from augmented reality (AR). This allows to superimpose historic content and the current view, as seen through the camera of a mobile device. Another way to present the results oa a similarity search is exploit concepts from virtual reality (VR) to project them into a virtual space, for instance in a virtual museum as in the VIRTUE project.
  7. Motion features for motion-based video retrieval: In video collections, motion can be an essential information to characterize content (e.g., motion of an object across different frames). In this work, we address the exploitation of motion-related information for multimedia retrieval. Examples are from sports, especially the search for ball or player motion in sports videos.
  8. Similarity search in digitized paper watermarks: In the field of historical document analysis, watermarks are very important features as they provide information about the paper mill, the paper manufacturer, the production tools and thus the period of time when the paper was produced. Over the centuries, thousands of watermarks were manufactured in different countries. In order to find out when a particular piece of paper has been produced, historians compare the watermark embedded within the paper with a ‘database&lrquo; of known watermarks and their variations – where ‘database&lrquo; in most cases means a collection of watermark illustrations in dedicated textbooks. Hence, age determination is a manual process in which human experts perform a ‘similarity search&lrquo;. Since most paper mills have applied very many minor variations to their watermarks over time, this is a tedious process. In our work, we apply sketch-based image retrieval to support historians in their tasks of finding the proper reference watermark for a given piece of paper. This is significantly supported by interactive paper user interfaces and a smartphone app for easy sketch generation.



Research Projects

Thesis Projects