vitrivr reads (Bachelor Project, Finished)

Author

Description

Text in a video often conveys information which is not easily expressed otherwise. Additionally, retrieval based on scene text has proven invaluable in retrieval competitions such as VBS and LSC. This project deals with the integration of state of the art scene-text transcription into vitrivr using Tensorflow. Ideally, the implementation not only provides the text but also its location. While scene-text transcription is often a two-stage process where first the text is located and then transcribed, an implementation can also use end-to-end transcription where appropriate.
As an extension, the challenge of scrolling text or merging text across segments (e.g. subtitles) could be tackled. The following steps are part of the project:

Survey existing implementations where there are pre-trained models available and their performance / usability
Re-implement the code as a standalone component using the Tensorflow Library for Java
Implement a new feature in cineast which extracts scene text from images and video and stores it using the existing pipeline
Evaluate the performance in terms of runtime per shot / frame

Start / End Dates

2020/02/01 - 2020/05/27

Supervisors

Research Topics

Multimedia Information Retrieval