Deepmime: Gesture Recognition and Retrieval System (Ongoing)
Gestures are a non-separable part of everyday communication and a crucial component of human-machine interaction. There are several methods of capturing gestures to make the recognition and detection easier, for instance using color, motion sensors, or depth cameras. However, in everyday practices, gestures are recorded by normal cameras and a lot of applications rely on recognizing gestures in this scenario. The goal of the Deepmime Project is to develop a system which performs real-time gesture recognition based on only RGB camera feeds.
The Deepmime System is envisioned to have four components:
- Preprocessing: Gestures are often performed in a noisy and cluttered environment. It is important that the system can reduce the clutter and focus on the person performing the gesture. Additionally, a sequence of gestures may appear in a video and it is crucial to temporally locate individual gestures.
- Feature extraction: The core of Deepmime is responsible for extracting spatio-temporal features from processed video input and using them for gesture classification. This component is based on Deep Learning techniques which learns to optimally extract features and classify them.
- Similarity Search and Retrieval: In addition to classification of gesture video, Deepmime has a retrieval component which performs similarity search and retrieval of similar results to the query based on the features extracted from the feature extraction component. The query can be a video containing a hand gesture or a textual query referring to a particular gesture type.
- User Interface: The front-end of the system is responsible for taking the user queries (recorded video, real time webcam input and text), sending the query to the backend where the classification or retrieval is performed, and displaying the results in a user-friendly space. One of the research goals of Deepmime is adapting the search results to the user’s need based on expert user feedback.
University of Mons, Belgium
- Mahnaz Amiri Parian, Luca Rossetto, Heiko Schuldt, Stéphane Dupont
vitrivr and Deepmime: Gesture-based search in Video Collections
Proceedings of the International Conference on Multimodal Communication (ICMC): Developing New Theories and Methods, Osnabrück, Germany 2020/5