Cross-Modal 3D Model Retrieval

Authors
Raphael WaltenspĆ¼l, Florian Spiess, Heiko Schuldt
Type
In Proceedings
Date
2024/12
Appears in
Proceedings of the 26th International Symposium on Multimedia (ISM'24)
Location
Tokyo, Japan
Abstract

Within the domain of multimedia formats, 3D models – alongside images, videos, and texts – are rapidly gaining prominence in applications and research. As the uses of 3D models increase and it becomes easier and more accessible to capture and digitally create 3D models, methods are required to allow automatic analysis and retrieval within large collections. While a lot of previous work has focused on geometry-based retrieval, these methods usually require an example 3D model to query, and often struggle to capture appearance information expressed through textures and materials. In this paper, we propose a novel view-based approach that enables embedding of 3D models into multi-modal embedding spaces, by rendering 2D planar projections. Furthermore, our approach addresses the challenge of viewpoint selection for 3D model rendering, in order to maximize its semantic recognition. We demonstrate the capability of our approach using two pre-trained multi-modal embedding models applied to a large collection of modern 3D models

Research Projects