Indexing and Retrieval of Semantic Sub-Concepts in 3D Scenes (Master Thesis, Ongoing)
Author
Description
Multimedia retrieval focuses on indexing and retrieving media by applying various features that embed the media into a feature space. These feature spaces range from simple domains, such as colour spaces, to highly complex, high-dimensional spaces. Recently, semantic feature spaces, which represent meaning through high-dimensional vectors, have gained considerable attention and proven highly effective. Since distance metrics can be applied to such vector embeddings, media items become comparable and thus searchable.
By applying semantic methods across various media types, including text, images, and videos, a multimodal approach is realized, enabling comparison and retrieval across different domains.
The work presented in Cross-Modal 3D Model Retrieval proposes such an approach specifically for 3D models, where individual models are rendered and indexed using visual-text co-embedding to facilitate semantic search.
However, this work employs a classical retrieval approach comparable to image or video search, where a large indexed collection is queried against individual search requests. This method is particularly suitable for media such as texts, images, or videos, as these are usually represented in linear collections. Such collections, including documents, albums, or series, generally have linear degrees of freedom, primarily enabling forward and backward navigation.
In contrast, the natural organization of 3D models does not follow a linear collection but rather a spatial arrangement within three-dimensional compositions, known as scenes. These arrangements inherently possess seven degrees of freedom. Furthermore, the spatial configuration of individual objects significantly influences their semantic interpretation, giving rise to emergent effects.
For instance, a bunny placed in a children's room next to a teddy bear conveys a different semantic meaning than the same bunny placed in a hutch in the garden.
When considering complex scenes composed of many sub-scenes and 3D models, it becomes clear that there is not just one semantic concept for the entire scene. Instead, numerous sub-concepts emerge, which together form the overall concept of the scene.
This project investigates how sub-parts of complex 3D scenes can be made searchable.
The goal is to research and develop indexing strategies for scenes, where applying semantic embeddings to partial regions enables the indexing and retrieval of semantic sub-scenes. This preparation report forms the foundation on which the master's thesis will be built.
Start / End Dates
2025/11/03 - 2026/05/02