Classification of noisy free-text prostate cancer pathology reports using natural language processing

Dhrangadhariya, Anjani; Otálora, Sebastian; Atzori, Manfredo; Müller, Henning

doi:10.1007/978-3-030-68763-2_12

Dhrangadhariya, Anjani; Otálora, Sebastian; Atzori, Manfredo; Müller, Henning

2021

Télécharger

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Résumé

Free-text reporting has been the main approach in clinical pathology practice for decades. Pathology reports are an essential information source to guide the treatment of cancer patients and for cancer registries, which process high volumes of free-text reports annually. Information coding and extraction are usually performed manually and it is an expensive and time-consuming process, since reports vary widely between institutions, usually contain noise and do not have a standard structure. This paper presents strategies based on natural language processing (NLP) models to classify noisy free-text pathology reports of high and low-grade prostate cancer from the open-source repository TCGA (The Cancer Genome Atlas). We used paragraph vectors to encode the reports and compared them with n-grams and TF-IDF representations. The best representation based on distributed bag of words of paragraph vectors obtained an f1-score of 0.858 and an AUC of 0.854 using a logistic regression classifier. We investigate the classifier’s more relevant words in each case using the LIME interpretability tool, confirming the classifiers’ usefulness to select relevant diagnostic words. Our results show the feasibility of using paragraph embeddings to represent and classify pathology reports.

Détails

Titre Classification of noisy free-text prostate cancer pathology reports using natural language processing

Auteur(s)/ trice(s) Dhrangadhariya, Anjani (University of Applied Sciences and Arts Western Switzerland (HES-SO Valais-Wallis))
Otálora, Sebastian (University of Applied Sciences and Arts Western Switzerland (HES-SO Valais-Wallis))
Atzori, Manfredo (University of Applied Sciences and Arts Western Switzerland (HES-SO Valais-Wallis))
Müller, Henning (University of Applied Sciences and Arts Western Switzerland (HES-SO Valais-Wallis) ; University of Geneva, Switzerland)

Date 2021-01

Publié dans Pattern Recognition. ICPR International Workshops and Challenges : Virtual Event, January 10–15, 2021, Proceedings, Part I

Editeur Milan, Italy, 10 January 2021

Pagination Pp. 154-166

Présenté à Pattern Recognition. ICPR International Workshops and Challenges, Milan, Italy, 2021-01-10, 2021-01-10

ISBN 978-3-030-68762-5

DOI https://doi.org/10.1007/978-3-030-68763-2_12

ISSN 0302-9743

Collection et n° Lecture Notes in Computer Science, vol. 12661

Mots-clés (libres) pathology reports ; natural language processing ; paragraph embedding

Type de papier published full paper

Domaine Economie et Services

Ecole HEG-VS

Institut Institut Informatique de gestion

Le document apparaît dans Documents de conférences
Global

Ressource(s) externe(s) Online presentation

Résumé

Détails

Actions

PDF