Hybrid human-machine classification system for cultural heritage data

Shabani, Shaban (Haute école de gestion Arc, HES-SO // Haute Ecole Spécialisée de Suisse Occidentale) ; Sokhn, Maria (Haute école de gestion Arc, HES-SO // Haute Ecole Spécialisée de Suisse Occidentale) ; Schuldt, Heiko (Mathematics and computer science, University of Basel, Basel, Switzerland)

The advancement of digital technologies has helped cultural heritage organizations to digitize their data collections and improve the accessibility via online platforms. These platforms have enabled citizens to contribute to the process of digital preservation of cultural heritage by sharing documents and their knowledge. However, many historical datasets have problems due to incomplete metadata. To solve this issue, cultural heritage organizations heavily depend on domain experts. In this paper, we address the issue of completing the metadata of historical digital collections. For this, we introduce a new hybrid human-machine model. This model jointly integrates predictions of a deep multi-input model and inferred labels from multiple crowd judgements. The multi-input model uses visual features extracted from the images and textual features from the metadata, complemented with Wikipedia classes of concepts extracted in the text. On the crowd answer aggregation, our method considers the workers' reliability scores. This score is based on the performance of workers' task history and their performance in our task. We have applied our hybrid approach to a culture heritage platform and the evaluations show that it outperforms both deep learning and crowdsourcing when applied individually.


Note: Due to the COVID-19 outbreak, the 2nd Workshop on Structuring and Understanding of Multimedia heritAge Contents conference venue in Seattle was cancelled. The proceedings of the online conference are however published according to the original schedule


Keywords:
Conference Type:
published full paper
Faculty:
Economie et Services
School:
HEG Arc
Institute:
IDO - Institut de Digitalisation des organisations
Subject(s):
Economie/gestion
Informatique
Publisher:
Seattle, USA, 12 October 2020
Date:
2020-10
Seattle, USA
12 October 2020
Pagination:
Pp. 49–56
Published in:
Proceedings of the 2nd Workshop on Structuring and Understanding of Multimedia heritAge Contents (SUMAC'2020)
DOI:
ISBN:
9781450381550
Appears in Collection:



 Record created 2021-04-21, last modified 2021-04-26

Fulltext:
Download fulltext
PDF

Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)