A comprehensive study of ImageNet pre-training for historical document image analysis

Studer, Linda (Document Image and Voice Analysis Group (DIVA), University of Fribourg, Fribourg, Switzerland) ; Alberti, Michele (Document Image and Voice Analysis Group (DIVA), University of Fribourg, Fribourg, Switzerland) ; Pondenkandath, Vinaychandran (Document Image and Voice Analysis Group (DIVA), University of Fribourg, Fribourg, Switzerland) ; Goktepe, Pinar (Document Image and Voice Analysis Group (DIVA), University of Fribourg, Fribourg, Switzerland) ; Kolonko, Thomas (Document Image and Voice Analysis Group (DIVA), University of Fribourg, Fribourg, Switzerland) ; Fischer, Andreas (Document Image and Voice Analysis Group (DIVA), University of Fribourg, Fribourg, Switzerland ; School of Engineering and Architecture (HEIA-FR), HES-SO // University of Applied Sciences Western Switzerland) ; Liwicki, Marcus (Document Image and Voice Analysis Group (DIVA), University of Fribourg, Fribourg, Switzerland ; Machine Learning Group, Lulea University of Technology, Lulea, Sweden) ; Ingold, Rolf (Document Image and Voice Analysis Group (DIVA), University of Fribourg, Fribourg, Switzerland)

Automatic analysis of scanned historical documents comprises a wide range of image analysis tasks, which are often challenging for machine learning due to a lack of humanannotated learning samples. With the advent of deep neural networks, a promising way to cope with the lack of training data is to pre-train models on images from a different domain and then fine-tune them on historical documents. In the current research, a typical example of such cross-domain transfer learning is the use of neural networks that have been pre-trained on the ImageNet database for object recognition. It remains a mostly open question whether or not this pre-training helps to analyse historical documents, which have fundamentally different image properties when compared with ImageNet. In this paper, we present a comprehensive empirical survey on the effect of ImageNet pretraining for diverse historical document analysis tasks, including character recognition, style classification, manuscript dating, semantic segmentation, and content-based retrieval. While we obtain mixed results for semantic segmentation at pixel-level, we observe a clear trend across different network architectures that ImageNet pre-training has a positive effect on classification as well as content-based retrieval.


Conference Type:
full paper
Faculty:
Ingénierie et Architecture
School:
HEIA-FR
Institute:
iCoSys - Institut des systèmes complexes
Publisher:
Sydney, New South Wales, Australia, 20-25 september 2019
Date:
2019-09
Sydney, New South Wales, Australia
20-25 september 2019
Pagination:
6 p.
Published in:
Proceedings of ICDAR 2019 : 15th International Conference on Document Analysis and Recognition, 20-25 September 2019, Sydney, New South Wales, Australia
Appears in Collection:



 Record created 2020-01-17, last modified 2020-01-21

Fulltext:
Download fulltext
PDF

Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)