Résumé

In this paper we present a novel text line segmentation method for historical manuscript images. We use a pyramidal approach where at the first level, pixels are classified into: text, background, decoration, and out of page, at the second level, text regions are split into text line and non text line. Color and texture features based on Local Binary Patterns and Gabor Dominant Orientation are used for classification. By applying a modified Fast Correlation-Based Filter feature selection algorithm, redundant and irrelevant features are removed. Finally, the text line segmentation results are refined by a smoothing post-processing procedure. Unlike other projection profile or connected components methods, the proposed algorithm does not use any script-specific knowledge and is applicable to color images. The proposed algorithm is evaluated on three historical manuscript image datasets of diverse nature and achieved an average precision of 91% and recall of 84%. Experiments also show that the proposed algorithm is robust with respect to changes of the writing style, page layout, and noise on the image.

Détails

Actions