Systematic comparison of deep learning strategies for weakly supervised Gleason grading

Otálora, Sebastian (University of Applied Sciences and Arts Western Switzerland (HES-SO Valais-Wallis) ; University of Geneva, Switzerland) ; Atzori, Manfredo (University of Applied Sciences and Arts Western Switzerland (HES-SO Valais-Wallis)) ; Khan, Amjad (University of Applied Sciences and Arts Western Switzerland (HES-SO Valais-Wallis)) ; Jiménez del Toro, Oscar Alfonso (University of Geneva, Switzerland) ; Andrearczyk, Vincent (University of Geneva, Switzerland) ; Müller, Henning (University of Applied Sciences and Arts Western Switzerland (HES-SO Valais-Wallis) ; University of Geneva, Switzerland)

Prostate cancer (PCa) is one of the most frequent cancers in men. Its grading is required before initiating its treatment. The Gleason Score (GS) aims at describing and measuring the regularity in gland patterns observed by a pathologist on the microscopic or digital images of prostate biopsies and prostatectomies. Deep Learning based (DL) models are the state-of-the-art computer vision techniques for Gleason grading, learning high-level features with high classification power. However, for obtaining robust models with clinical-grade performance, a large number of local annotations are needed. Previous research showed that it is feasible to detect low and high-grade PCa from digitized tissue slides relying only on the less expensive report{level (weakly) supervised labels, thus global rather than local labels. Despite this, few articles focus on classifying the finer-grained GS classes with weakly supervised models. The objective of this paper is to compare weakly supervised strategies for classification of the five classes of the GS from the whole slide image, using the global diagnostic label from the pathology reports as the only source of supervision. We compare different models trained on handcrafted features, shallow and deep learning representations. The training and evaluation are done on the publicly available TCGA-PRAD dataset, comprising of 341 whole slide images of radical prostatectomies, where small patches are extracted within tissue areas and assigned the global report label as ground truth. Our results show that DL networks and class-wise data augmentation outperform other strategies and their combinations, reaching a kappa score of κ = 0:44, which could be further improved with a larger dataset or combining both strong and weakly supervised models.

Conference Type:
published full paper
Economie et Services
Institut Informatique de gestion
Houston, USA, 15-20 February 2020
Houston, USA
15-20 February 2020
8 p.
Published in:
Proceedings of the SPIE Medical Imaging 2020
Numeration (vol. no.):
Vol. 11320
Appears in Collection:

Note: The status of this file is: restricted

 Record created 2020-11-13, last modified 2020-11-17

Download fulltext

Rate this document:

Rate this document:
(Not yet reviewed)