Multi-angle, Multi-person Gesture Recognition (Bachelor Thesis, Finished)

Author

Description

Communicational and co-speech gestures are an inseparable part of human daily interactions, and understanding them is the first step in developing human machine interaction methods. Recognizing and retrieving such gestures in recorded videos in multi-camera scenarios where the gesture articulation spans through different shots and camera angles, is a challenge which rarely has been addressed in the literature. Moreover, such settings occasionally involve multiple people in the scene which requires the system to track and identify the gestures of each individual separately. In this thesis, a pose-based gesture retrieval system using deep learning methods is developed to specifically tackle the multi-person, multi-angle scenarios which often happen in talk shows and news footage. The system benefits from the semantic segmentation and person re-identification methods to track the hand articulations and posture of persons present in the frame. The retrieval is based on features extracted from the skeletal keypoints of the sequence of segmented human instances in multiple frames and has been trained on gesture recognition datasets. To experiment the performance of the retrieval, the system evaluation was carried out on a part of the NewsScape dataset provided by the UCLA library through the assessment of the volunteers.

Start / End Dates

2020/06/08 - 2020/09/07

Supervisors

Research Topics

Multimedia Information Retrieval