Scalable Multi-Person Recognition, Clustering, and Retrieval in Large-Scale Collections (Bachelor Thesis, Ongoing)

Author

Mikolaj Siemaszkiewicz

Description

Traditional multimedia retrieval systems often treat images or videos as independent items and focus on object- or scene-level semantics. However, in many real-world video collections, such as movies, TV broadcasts, and online media, people are in focus. Users often search not only for scenes but also for specific individuals, recurring identities, or interactions among multiple people.

While face recognition has achieved strong performance under controlled conditions, scalable multi-person recognition and clustering in large-scale video collections remain challenging. Videos contain multiple people per frame, frequent occlusions, pose changes, motion blur, illumination variations, and dynamically changing group compositions. Furthermore, identity retrieval in video requires temporal consistency: the same person must be linked across frames and scenes.

This bachelor’s thesis investigates scalable multi-person recognition, clustering, and retrieval in large-scale video collections in vitrivr-engine. The work follows an incremental approach: it starts in a controlled setting using the CASTLE dataset, then gradually extends to more unconstrained, in-the-wild video data. The system will detect and recognize multiple persons per frame, cluster identities across a collection, and provide an interactive interface for identity-based exploration and retrieval.

Start / End Dates

2026/03/23 - 2026/07/22

Supervisors

Research Topics