Résumé

Finding key terms in scanned historical manuscripts is invaluable for accessing our written cultural heritage. While keyword spotting (KWS) approaches based on machine learning achieve the best spotting results in the current state of the art, they are limited by the fact that annotated learning samples are needed to infer the writing style of a particular manuscript collection. In this paper, we propose an annotation-free KWS method that does not require any labeled handwriting sample but learns from a printed font instead. First, we train a deep convolutional character detection system on synthetic pages using printed characters. Afterwards, the structure of the detected characters is modeled by means of graphs and is compared with search terms using graph matching. We evaluate our method for spotting logographic Chu Nom characters on the newly introduced Kieu database, which is a historical Vietnamese manuscripts containing 719 scanned pages of the famous Tale of Kieu. Our results show that search terms can be found with promising precision both when providing handwritten samples (query by example) as well as printed characters (query by string).

Détails

Actions