Gesture of Interest: Gesture Search for Multi-Person, Multi-Perspective TV Footage
In real-world datasets, specifically in TV recordings, videos are often multi-person and multi-angle, which poses significant challenges for gesture recognition and retrieval. In addition to being of interest to linguists, gesture retrieval is a novel and challenging application for multimedia retrieval. In this paper, we propose a novel method for spatio-temporal gesture retrieval based on visual and pose information which can retrieve similar gestures in multi-person scenes through continuous shots. The attention-aware features, extracted from human pose keypoints, together with a sophisticated pre-processing module, alleviate the susceptibility of gesture retrieval to background noise and occlusion. We have evaluated our method on a subset of the NewsScape Dataset. Our experimental results demonstrate the effectiveness of the proposed method in retrieving similar results in occluded scenes as measured by the quality of the top 5 results.