Hybrid Human-Machine Classification System for Cultural Heritage Data

Shabani, Shaban; Sokhn, Maria; Schuldt, Heiko

Hybrid Human-Machine Classification System for Cultural Heritage Data

Authors

Shaban Shabani, Maria Sokhn, Heiko Schuldt

Type

In Proceedings

Date

2020/10

Appears in

Proceedings of the 2nd workshop on Structuring and Understanding of Multimedia heritAge Contents (SUMAC 20)

Location

Seattle, WA, USA (held virtually)

Abstract

The advancement of digital technologies has helped cultural heritage organizations to digitize their data collections and improve the accessibility via online platforms. These platforms have enabled citizens to contribute to the process of digital preservation of cultural heritage by sharing documents and their knowledge. However, many historical datasets have problems due to incomplete metadata. To solve this issue, cultural heritage organizations heavily depend on domain experts. In this paper, we address the issue of completing the metadata of historical digital collections. For this, we introduce a new hybrid human-machine model. This model jointly integrates predictions of a deep multi-input model and inferred labels from multiple crowd judgements. The multi-input model uses visual features extracted from the images and textual features from the metadata, complemented with Wikipedia classes of concepts extracted in the text. On the crowd answer aggregation, our method considers the workers’ reliability scores. This score is based on the performance of workers’ task history and their performance in our task. We have applied our hybrid approach to a culture heritage platform and the evaluations show that it outperforms both deep learning and crowdsourcing when applied individually.

Staff members

Research Projects

City-Stories: Spatio-temporal search over crowdsourced content