Studying public medical images from the Open Access literature and social networks for model training and knowledge extraction

Müller, Henning (University of Applied Sciences and Arts Western Switzerland (HES-SO Valais-Wallis) ; University of Geneva, Switzerland) ; Andrearczyk, Vincent (University of Applied Sciences and Arts Western Switzerland (HES-SO Valais-Wallis)) ; Jiménez del Toro, Oscar Alfonso (University of Applied Sciences and Arts Western Switzerland (HES-SO Valais-Wallis)) ; Dhrangadhariya, Anjani (University of Applied Sciences and Arts Western Switzerland (HES-SO Valais-Wallis)) ; Schaer, Roger (University of Applied Sciences and Arts Western Switzerland (HES-SO Valais-Wallis)) ; Atzori, Manfredo (University of Applied Sciences and Arts Western Switzerland (HES-SO Valais-Wallis))

Medical imaging research has long suffered problems getting access to large collections of images due to privacy constraints and to high costs that annotating images by physicians causes. With public scientific challenges and funding agencies fostering data sharing, repositories, particularly on cancer research in the US, are becoming available. Still, data and annotations are most often available on narrow domains and specific tasks. The medical literature (particularly articles contained in MedLine) has been used for research for many years as it contains a large amount of medical knowledge. Most analyses have focused on text, for example creating semi-automated systematic reviews, aggregating content on specific genes and their functions, or allowing for information retrieval to access specific content. The amount of research on images from the medical literature has been more limited, as MedLine abstracts are available publicly but no images are included. With PubMed Central, all the biomedical open access literature has become accessible for analysis, with images and text in structured format. This makes the use of such data easier than extracting it from PDF. This article reviews existing work on analyzing images from the biomedical literature and develops ideas on how such images can become useful and usable for a variety of tasks, including finding visual evidence for rare or unusual cases. These resources offer possibilities to train machine learning tools, increasing the diversity of available data and thus possibly the robustness of the classifiers. Examples with histopathology data available on Twitter already show promising possibilities. This article add links to other sources that are accessible, for example via the ImageCLEF challenges.

Conference Type:
non-published full paper
Economie et Services
Institut Informatique de gestion
Daejeon, Korea, 5-8 January 2020
Daejeon, Korea
5-8 January 2020
12 p.
Published in:
Proceedings of the 26th International Conference on Multimedia Modeling (MMM2020)
Appears in Collection:

Note: The status of this file is: restricted

 Record created 2020-12-01, last modified 2020-12-04

Download fulltext

Rate this document:

Rate this document:
(Not yet reviewed)