Résumé

The term "historical documents" encompasses an enormous variety of document types considering different scripts, languages, writing supports, and degradation degrees. For automatic processing with machine learning and pattern recognition methods, it would be ideal to share labeled learning samples and trained statistical models across similar documents, avoiding a retraining from scratch for every historical document anew. In this paper, we propose using the reconstruction error of autoencoders to compare historical manuscripts with the goal of clustering them according to their visual appearance. A low reconstruction error suggests visual similarity between a new manuscript and a known manuscript, for which the autoencoder was trained in an unsupervised fashion. Preliminary experiments conducted on 10 different manuscripts written with ink on parchment demonstrate the ability of the reconstruction error to group similar writing styles. For discriminating between Carolingian and cursive script, in particular, near-perfect results are reported.

Détails

Actions