A Gesture Recognition and Classification User Interface for RGB-based Videos (Bachelor Thesis, Finished)

Author

Description

Gestures are a non-separable part of everyday communication and a crucial component of human-machine interaction. There are several methods of capturing gestures to make the recognition and detection easier, for instance using color, motion sensors or depth cameras. However, there are many instances where the gestures to be recognized are captured through normal cameras and a lot of applications rely on recognizing gestures in this environment. The goal of the Deepmime Project is to develop a system which performs real-time gesture recognition based on only RGB camera feeds.

The Deepmime System is envisioned to have three components:

Feature extraction: The core of Deepmime is responsible for extracting spatiotemporal features from processed video input and using them for gesture classification
Preprocessing: Gestures are often performed in a noisy environment or in the background. It is important that the system can reduce the clutter and focus on the person performing the gesture. Additionally, since the video frames can include “silent” moments, where there are no gestures happening, temporal localization of gestures is important.
User Interface: The front-end of the system is responsible for taking the user queries (either as recorded video or as real time webcam input), sending the query to the backend where the classification is performed, and displaying the results in a user-friendly and appealing manner. One of the research goals is making Deepmime
adaptive based on expert user feedback, so the UI should have an expert mode where users can give such feedback and send it to the backend, where it is stored for further processing.

The Bachelor’s Thesis has the following objectives:

Implementing a backend component linked to the aforementioned UI written in python which is capable of running existing gesture recognition modules. The backend should also store received videos, user feedback and classification results.
Building the first version of the UI component with the following features: Uploading, recording, and playback of video queries and a self-updating results display. There should be an expert mode where users can give feedback on the results (among other things). The UI should be written in TypeScript using Angular.
[optional] As an extension, the preprocessing component of the backend could be improved by adapting temporal localization methods such as BSN (github) proposed by
Tianwei Lin et. al.

Start / End Dates

2019/02/25 - 2019/06/25

Supervisors

Research Topics

Multimedia Information Retrieval