Automated Collection Generation and Organization (Master Project, Finished)


Simon Peterhans


This project's context is twofold. Firstly, the VIRTUE project, initiated by researchers of the DBIS group, provides a full system to curate and visit VR exhibitions. The open-source system includes the end-user front-end; Virtual Reality Exhibition Presenter (VREP), a back-end software for managing and editing such exhibitions, Virtual Reality Exhibition Manager (VREM) and its corresponding user-interface for e.g., curators, VREMui and a database (MongoDB).

Up to now, collections needed to be curated manually to be displayed in VIRTUE, which is easy for small collections, but a very time consuming endeavor even for collections with only a few hundred media items.

Secondly, the context of this project also finds itself within the PIA project (Participatory Knowledge Practices in Analog and Digital Image Archives), the main objective of which is to allow participatory use of large cultural heritage image archives.

To be able to use such large image archives in any practical way, even just for exploration, some form of curation is required.

Cultural heritage data collections are often very large; in the context of PIA, around 55'000 images have already been digitalized, with many more to be added.



In both of the described contexts there is a need for automatic collection generation and organization in large data sets.

The goal of this project is to research and implement different methods for the automatic generation of coherent collections from large multimedia corpora and to maintain metadata on different sub-collections, depending on user-defined criteria.

The project will focus both on methods automatically generating collections based on an initial user selection, as well as those based entirely on patterns automatically detected within the data. Membership to such dynamically created collections will be part of the collection metadata, in order to recreate certain collections / exhibitions at a later stage.

To facilitate an initial user selection a random selection from the dataset may be presented, or more deliberate methods such as self-organizing maps may be used. A foundation for more sophisticated approaches, the retrieval engine of vitrivr, Cineast, shall be used in order to support exhibition generation on a basis of e.g. visual similarity.

Ultimately, the methods should be integrated into the VIRTUE system, to be able to automatically generate collections in the format required by VREM, allowing them to be persistently stored like regular manually curated collections and displayed through VIRTUE.

Start / End Dates

2021/05/17 - 2021/10/01


Research Topics