SIB text mining at TREC 2018 precision medicine track

Pasche, Emilie (Haute école de gestion de Genève, HES-SO // Haute Ecole Spécialisée de Suisse Occidentale ; SIB Text Mining, Swiss Institute of Bioinformatics, Geneva, Switzerland) ; Rijen, Paul van (Haute école de gestion de Genève, HES-SO // Haute Ecole Spécialisée de Suisse Occidentale ; SIB Text Mining, Swiss Institute of Bioinformatics, Geneva, Switzerland) ; Gobeill, Julien (Haute école de gestion de Genève, HES-SO // Haute Ecole Spécialisée de Suisse Occidentale ; SIB Text Mining, Swiss Institute of Bioinformatics, Geneva, Switzerland) ; Mottaz, Anaïs (Haute école de gestion de Genève, HES-SO // Haute Ecole Spécialisée de Suisse Occidentale ; SIB Text Mining, Swiss Institute of Bioinformatics, Geneva, Switzerland ; University of Geneva, Switzerland) ; Mottin, Luc (Haute école de gestion de Genève, HES-SO // Haute Ecole Spécialisée de Suisse Occidentale ; SIB Text Mining, Swiss Institute of Bioinformatics, Geneva, Switzerland) ; Ruch, Patrick (Haute école de gestion de Genève, HES-SO // Haute Ecole Spécialisée de Suisse Occidentale ; SIB Text Mining, Swiss Institute of Bioinformatics, Geneva, Switzerland)

The TREC 2018 Precision Medicine Track largely repeats the structure and evaluation of the 2017 track. The collection remains identical. Again, our team participated in the both tasks of the track: 1) retrieving scientific abstracts addressing relevant treatments for a given case and 2) retrieving clinical trials for which a patient is eligible. Regarding the retrieval of scientific abstracts, we queried all abstracts concerning one of the entities of the topic (i.e. the disease, the gene or the genetic variant) using various strategies (e.g. search in annotations of the collection, free text search using or not using synonyms, search in the MeSH terms, etc.). Then, for a given topic, the complete set of abstracts was based on the generation of different queries with decreasing levels of specificity. The idea was to start with a very specific query containing gene, disease and variant, from which less specific queries would be inferred. Abstracts were then re-ranked based on different strategies to favor abstracts that we considered more relevant to the given task. In 2017 we tested the use of drug densities to identify abstracts related to treatment. For this year we refined this strategy by giving more weight to drugs related to cancer treatment. Secondly, we used demographic information to favor abstracts concerning patients of the specified age-group and gender, and disfavoring abstracts targeting other age-group or gender patients. For the third strategy we utilized a word-level convolutional neural network to increase the rank of abstracts related to precision medicine. The fourth strategy consisted to expand the query to parent and children diseases. Finally, we tested an exact run which only retrieved abstracts respecting all information given in the topic. Results showed that all strategies but the last one resulted in some improvement of the retrieval power of the engine. As expected, our final run, focusing of precision, resulted in our best results regarding precision at rank 10, while other measures were negatively impacted. Regarding the retrieval of scientific abstracts, we boosted our last year’s approach – which achieved competitive results – with supplementary strategies issued from other participants. Regarding the retrieval of clinical trials, we investigated filtering strategies for managing the condition (disease), and standard information retrieval for managing the gene and genetic variant. The results show that, despite the presence of a structured condition tag in the document, better performances are obtained when relaxing constraints: using synonyms and detecting the diseases in various fields, such as the summary.


Conference Type:
full paper
Faculty:
Economie et Services
School:
HEG - Genève
Institute:
CRAG - Centre de Recherche Appliquée en Gestion
Subject(s):
Sciences de l'information
Publisher:
Gaithersburg, USA, 14-16 November 2018
Date:
2018-11
Gaithersburg, USA
14-16 November 2018
Pagination:
7 p.
Published in:
Proceedings of the TREC 2018 Conference
External resources:
Appears in Collection:



 Record created 2019-10-11, last modified 2019-10-22

Fulltext:
Download fulltext
PDF

Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)