neXtA5 : accelerating annotation of articles via automated approaches in neXtProt

Mottin, Luc (Haute école de gestion de Genève, HES-SO // Haute Ecole Spécialisée de Suisse Occidentale) ; Gobeill, Julien (Haute école de gestion de Genève, HES-SO // Haute Ecole Spécialisée de Suisse Occidentale) ; Pasche, Emilie (Haute école de gestion de Genève, HES-SO // Haute Ecole Spécialisée de Suisse Occidentale) ; Michel, Pierre-André (Calipho Group, Swiss Institute of Bioinformatics) ; Cusin, Isabelle (Calipho Group, Swiss Institute of Bioinformatics) ; Gaudet, Pascale (Calipho Group, Swiss Institute of Bioinformatics) ; Ruch, Patrick (Haute école de gestion de Genève, HES-SO // Haute Ecole Spécialisée de Suisse Occidentale)

The rapid increase in the number of published articles poses a challenge for curated databases to remain up-to-date. To help the scientific community and database curators deal with this issue, we have developed an application, neXtA5, which prioritizes the literature for specific curation requirements. Our system, neXtA5, is a curation service composed of three main elements. The first component is a named-entity recognition module, which annotates MEDLINE over some predefined axes. This report focuses on three axes: Diseases, the Molecular Function and Biological Process sub-ontologies of the Gene Ontology (GO). The automatic annotations are then stored in a local database, BioMed, for each annotation axis. Additional entities such as species and chemical compounds are also identified. The second component is an existing search engine, which retrieves the most relevant MEDLINE records for any given query. The third component uses the content of BioMed to generate an axis-specific ranking, which takes into account the density of named-entities as stored in the Biomed database. The two ranked lists are ultimately merged using a linear combination, which has been specifically tuned to support the annotation of each axis. The fine-tuning of the coefficients is formally reported for each axis-driven search. Compared with PubMed, which is the system used by most curators, the improvement is the following:+231% for Diseases,+236% for Molecular Functions and +3153% for Biological Process when measuring the precision of the topreturned PMID (P0 or mean reciprocal rank). The current search methods significantly improve the search effectiveness of curators for three important curation axes. Further experiments are being performed to extend the curation types, in particular protein–protein interactions, which require specific relationship extraction capabilities. In parallel, userfriendly interfaces powered with a set of JSON web services are currently being implemented into the neXtProt annotation pipeline.


Type d'article:
scientifique
Faculté:
Economie et Services
Ecole:
HEG GE Haute école de gestion de Genève
Institut:
CRAG - Centre de Recherche Appliquée en Gestion
Classification:
Sciences de l’information
Date:
2016
Pagination:
9 p,
Publié dans
Database : the journal of biological databases and curation
Numérotation (vol. no.):
2016, baw098
DOI:
Ressource(s) externe(s):
Le document apparaît dans:



 Notice créée le 2016-10-03, modifiée le 2018-08-31

Fichiers:
Télécharger le document
PDF

Évaluer ce document:

Rate this document:
1
2
3
 
(Pas encore évalué)