Résumé
Visualization methods for Convolutional Neural Networks (CNNs) are spreading within the medical community to obtain explainable AI (XAI). The sole qualitative assessment of the explanations is subject to a risk of confirmation bias. This paper proposes a methodology for the quantitative evaluation of common visualization approaches for histopathology images, i.e. Class Activation Mapping and Local-Interpretable ModelAgnostic Explanations. In our evaluation, we propose to assess four main points, namely the alignment with clinical factors, the agreement between XAI methods, the consistency and repeatability of the explanations. To do so, we compare the intersection over union of multiple visualizations of the CNN attention with the semantic annotation of functionally different nuclei types. The experimental results do not show stronger attributions to the multiple nuclei types than those of a randomly initialized CNN. The visualizations hardly agree on salient areas and LIME outputs have particularly unstable repeatability and consistency. The qualitative evaluation alone is thus not sufficient to establish the appropriateness and reliability of the visualization tools. The code is available on GitHub at bit.ly/2K4BHKz.