vitrivr reads (Bachelor Project, Finished)

Author

Kalthoum Nemmour

Description

Text in a video often conveys information which is not easily expressed otherwise. Additionally, retrieval based on scene text has proven invaluable in retrieval competitions such as VBS and LSC. This project deals with the integration of state of the art scene-text transcription into vitrivr using Tensorflow. Ideally, the implementation not only provides the text but also its location. While scene-text transcription is often a two-stage process where first the text is located and then transcribed, an implementation can also use end-to-end transcription where appropriate.
As an extension, the challenge of scrolling text or merging text across segments (e.g. subtitles) could be tackled. The following steps are part of the project:

Start / End Dates

2020/02/01 - 2020/05/27

Supervisors

Research Topics