Scalable Near-duplicate Detection (Master Thesis, Ongoing)
The system should be able to ingest data from different sources, at least Twitter and Reddit. It should then be able to effciently detect near-duplicates and ideally differentiate between different kinds of near-duplicates such as exact matches and transformed content to perform source detection.
Start / End Dates
2017/09/25 - 2018/03/24