Résumé

Automatic layout analysis of historical documents has to cope with a large number of different scripts, writing supports, and digitalization qualities. Under these conditions, the design of robust features for machine learning is a highly challenging task. We use convolutional autoencoders to learn features from the images. In order to increase the classification accuracy and to reduce the feature dimension, in this paper we propose a novel feature selection method. The method cascades adapted versions of two conventional methods. Compared to three conventional methods and our previous work, the proposed method achieves a higher classification accuracy in most cases, while maintaining low feature dimension. In addition, we find that a significant number of autoencoder features are redundant or irrelevant for the classification, and we give our explanations. To the best of our knowledge, this paper is one of the first investigations in the field of image processing on the detection of redundancy and irrelevance of autoencoder features using feature selection.

Détails

Actions