Scalable Near-duplicate Detection (Master Thesis, Ongoing)

Author

Silvan Heller

Description

The system should be able to ingest data from different sources, at least Twitter and Reddit. It should then be able to effciently detect near-duplicates and ideally differentiate between different kinds of near-duplicates such as exact matches and transformed content to perform source detection.

Start / End Dates

2017/09/25 - 2018/03/24

Supervisors

Research Topics