Representative Viewpoint Selection for 3D Model Retrieval (Master Thesis, Ongoing)

Author

Samuel Börlin

Description

With an increasing abundance of publicly available 3D models, it is becoming more and more important to be able to search through such content. Using search engines is an everyday activity nowadays, but is predominantly restricted to text and image retrieval. Despite advancements in text and image retrieval, methods for 3D model retrieval lag behind due to the inherently larger complexity and degrees of freedom in 3D models. However, one can make use of well established image retrieval methods also for 3D models by first rendering the 3D models into images and then doing retrieval based on those images. This is often done using neural networks like Inception-ResNet or CLIP which encode images into lower dimensional and less complex feature vectors. Additionally, and very importantly for retrieval, CLIP also bridges the semantic gap between images and text. However the quality of these feature vectors depends on the selection of viewpoints used to render the images of the 3D model. This thesis thus explores a new approach to finding viewpoints that are useful and representative for 3D model retrieval with models that produce semantic embeddings like CLIP. We investigate whether a single viewpoint suffices for this purpose or if multiple viewpoints are beneficial, and how to optimally combine multiple embeddings for retrieval. Our propsed method is largely based on clustering of neural radiance fields trained to predict CLIP embeddings in 3D space. We evaluate our approach based on classification and retrieval performance using the Objaverse dataset.

Start / End Dates

2024/09/16 - 2025/04/14

Supervisors

Research Topics