Context-Aware Multimedia Retrieval for XR Environments (PhD Thesis, ongoing)
The rapid evolution of technology has created new and innovative ways for us to interact with our environment. By utilizing extended reality (XR), we now have several options for immersive experiences. The XR continuum ranges from the real world, which has no immersion, to augmented reality (AR), where digital objects are superimposed onto the real-world environment; mixed reality (MR), which combines interactive digital objects with the real world; and finally, virtual reality (VR), where users are fully immersed in a completely digital environment. XR provides opportunities in education, healthcare, retail, and cultural heritage by overlaying relevant, personalized, and interactive information in real-time.
To facilitate these XR experiences, content is needed. In this research, we focus on how we can facilitate context-aware multimedia retrieval for XR environments. The research addresses challenges in multimedia retrieval, including the cognitive gap (user intent vs. system interpretation), the modality gap (retrieving across diverse media types), and the pragmatic gap (aligning human mental models with machine categorizations).
For our research, we propose the following five research questions:
- How can real-world signals be transformed into effective queries for multimedia retrieval in XR?
- How do spatial, temporal, and semantic contexts improve retrieval in XR?
- How should retrieved digital objects be presented in XR to balance immersion and usability?
- In what ways can multimedia retrieval enhance user experience in XR?
- What frameworks enable seamless integration and evaluation of multimedia retrieval in XR systems?
The XR environment plays an essential role in two stages: the query and the browsing time. First, the multimodal user signals, such as gaze, gestures, spoken queries, and contextual information, must be translated into retrieval queries. To complete this step, the real world must be segmented and analyzed in real-time to understand the user's gestures, for example. In a second step, the digital content, representing the results, must be meaningfully presented to the user. Therefore, it is crucial to know where and how the digital objects are placed. The content can either be positioned in reference to another object or just be placed in the free XR space. Therefore, we investigate how real-world interactions can seamlessly connect with digital media, either at query or at browsing time.
To demonstrate some of the model's functionality, a prototype system is implemented and evaluated using different XR hardware, such as the Apple Vision Pro, the XREAL, or the Meta Quest.
Therefore, we can frame our contributions in three parts: Conceptual, Technical, and Empirical.
Staff
Examiners
Prof. Dr. Heiko Schuldt, Prof. Dr. Florina M. Ciorba
Start Date
2022-11-15