Scalable Near-duplicate Detection (Master Thesis, Ongoing)


Silvan Heller


The system should be able to ingest data from different sources, at least Twitter and Reddit. It should then be able to effciently detect near-duplicates and ideally differentiate between different kinds of near-duplicates such as exact matches and transformed content to perform source detection.

Start / End Dates

2017/09/25 - 2018/03/24


Research Topics