Deepmime: Gesture Recognition and Retrieval System (Finished)

Gestures are a non-separable part of everyday communication and a crucial component of human-machine interaction. There are several methods of capturing gestures to make the recognition and detection easier, for instance using color, motion sensors, or depth cameras. However, in everyday practices, gestures are recorded by normal cameras and a lot of applications rely on recognizing gestures in this scenario. The goal of the Deepmime Project is to develop a system which performs real-time gesture recognition based on only RGB camera feeds.

The Deepmime System is envisioned to have four components:

Preprocessing: Gestures are often performed in a noisy and cluttered environment. It is important that the system can reduce the clutter and focus on the person performing the gesture. Additionally, a sequence of gestures may appear in a video and it is crucial to temporally locate individual gestures.
Feature extraction: The core of Deepmime is responsible for extracting spatio-temporal features from processed video input and using them for gesture classification. This component is based on Deep Learning techniques which learns to optimally extract features and classify them.
Similarity Search and Retrieval: In addition to classification of gesture video, Deepmime has a retrieval component which performs similarity search and retrieval of similar results to the query based on the features extracted from the feature extraction component. The query can be a video containing a hand gesture or a textual query referring to a particular gesture type.
User Interface: The front-end of the system is responsible for taking the user queries (recorded video, real time webcam input and text), sending the query to the backend where the classification or retrieval is performed, and displaying the results in a user-friendly space. One of the research goals of Deepmime is adapting the search results to the user’s need based on expert user feedback.

Start / End Dates

01.01.2017 - 31.10.2022

Partners

University of Mons, Belgium

Staff

Research Topics

Multimedia Information Retrieval

Publications

2023

Mahnaz Parian-Scherb, Peter Uhrig, Luca Rossetto, Stéphane Dupont, Heiko Schuldt
Gesture Retrieval and its Application to the Study of Multimodal Communication
International Journal on Digital Libraries, 2023/6

2021

Mahnaz Parian-Scherb, Claire Walzer, Luca Rossetto, Silvan Heller, Stéphane Dupont, Heiko Schuldt
Gesture of Interest: Gesture Search for Multi-Person, Multi-Perspective TV Footage
Proceedings of the 18th International Conference on Content-Based Multimedia Indexing (CBMI ‘21), Lille, France (held virtually) 2021/6
Mahnaz Parian-Scherb
Gesture similarity learning and retrieval in large-scale real-world video collections
PhD Thesis, Department of Mathematics and Computer Science, University of Basel, Switzerland 2021/6

2020

Mahnaz Amiri Parian, Luca Rossetto, Heiko Schuldt, Stéphane Dupont
Are You Watching Closely? Content-based Retrieval of Hand Gestures
Proceedings of the ACM International Conference on Multimedia Retrieval (ICMR 2020) , Dublin, Ireland (held virtually) 2020/10
Mahnaz Amiri Parian, Luca Rossetto, Heiko Schuldt, Stéphane Dupont
vitrivr and Deepmime: Gesture-based search in Video Collections
Proceedings of the International Conference on Multimodal Communication (ICMC): Developing New Theories and Methods, Osnabrück, Germany 2020/5