An extended overview of the CLEF 2020 ChEMU Lab : information extraction of chemical reactions from patents

He, Jiayuan (University of Melbourne, Australia) ; Nguyen, Dat Quoc (University of Melbourne, Australia ; VinAI Research, Vietnam) ; Akhondi, Saber A. (Elsevier BV, Amsterdam, The Netherlands) ; Druckenbrodt, Christian (Elsevier Information Systems GmbH, Frankfurt, Germany) ; Thorne, Camilo (Elsevier Information Systems GmbH, Frankfurt, Germany) ; Hoessel, Ralph (Elsevier Information Systems GmbH, Frankfurt, Germany) ; Afzal, Zubair (Elsevier BV, Amsterdam, The Netherlands) ; Zhai, Zenan (University of Melbourne, Australia) ; Fang, Biaoyan (University of Melbourne, Australia) ; Yoshikawa, Hiyori (University of Melbourne, Australia ; Fujitsu Laboratories Ltd., Japan) ; Albahem, Ameer (RMIT University, Melbourne, Australia) ; Wang, Jingqi (Melax Technologies, Inc, Houston, USA) ; Ren, Yuankai (Nantong University, Nantong, China) ; Zhang, Zhi (Nantong University, Nantong, China) ; Zhang, Yaoyun (Melax Technologies, Inc, Houston, USA) ; Dao, Mai Hoang (VinAI Research, Vietnam) ; Ruas, Pedro (LASIGE, Universidade de Lisboa, Lisbon, Portugal) ; Lamurias, Andre (LASIGE, Universidade de Lisboa, Lisbon, Portugal) ; Couto, Francisco M. (LASIGE, Universidade de Lisboa, Lisbon, Portugal) ; Copara, Jenny (Haute école de gestion de Genève, HES-SO // Haute Ecole Spécialisée de Suisse Occidentale ; Swiss Institute of Bioinformatics, Geneva, Switzerland ; University of Geneva, Geneva, Switzerland) ; Naderi, Nona (Haute école de gestion de Genève, HES-SO // Haute Ecole Spécialisée de Suisse Occidentale ; Swiss Institute of Bioinformatics, Geneva, Switzerland) ; Knafou, Julien (Haute école de gestion de Genève, HES-SO // Haute Ecole Spécialisée de Suisse Occidentale ; Swiss Institute of Bioinformatics, Geneva, Switzerland ; University of Geneva, Geneva, Switzerland) ; Ruch, Patrick (Haute école de gestion de Genève, HES-SO // Haute Ecole Spécialisée de Suisse Occidentale ; Swiss Institute of Bioinformatics, Geneva, Switzerland) ; Teodoro, Douglas (Haute école de gestion de Genève, HES-SO // Haute Ecole Spécialisée de Suisse Occidentale ; Swiss Institute of Bioinformatics, Geneva, Switzerland) ; Lowe, Daniel (Minesoft, Cambridge, United Kingdom) ; Mayfield, John (NextMove Software, Cambridge, United Kingdom) ; Köksal, Abdullatif (Bogazici University, Istanbul, Turkey) ; Dönmez, Hilal (Bogazici University, Istanbul, Turkey) ; Özkirimli, Elif (Bogazici University, Istanbul, Turkey ; Data and Analytics, F. Homann-La Roche AG, Switzerland) ; Özgür, Arzucan (Bogazici University, Istanbul, Turkey) ; Mahendran, Darshini (Virginia Common Wealth University, Richmond, United States) ; Gurdin, Gabrielle (Virginia Common Wealth University, Richmond, United States) ; Lewinski, Nastassja (Virginia Common Wealth University, Richmond, United States) ; Tang, Christina (Virginia Common Wealth University, Richmond, United States) ; McInness, Bridget T. (Virginia Common Wealth University, Richmond, United States) ; Malarkodi, C.S. (MIT Campus of Anna University, Chennai, India) ; Rao, Pattabhi Rk (MIT Campus of Anna University, Chennai, India) ; Devi, Sobha Lalitha (MIT Campus of Anna University, Chennai, India) ; Cavedon, Lawrence (RMIT University, Melbourne, Australia) ; Cohn, Trevor (University of Melbourne, Australia) ; Baldwin, Timothy (University of Melbourne, Australia) ; Verspoor, Karin (University of Melbourne, Australia)

The discovery of new chemical compounds is perceived as a key driver of the chemistry industry and many other economic sectors. The information about the new discoveries are usually disclosed in scientifc literature and in particular, in chemical patents, since patents are often the first venues where the new chemical compounds are publicized. Despite the signi_cance of the information provided in chemical patents, extracting the information from patents is costly due to the large volume of existing patents and its drastic expansion rate. The Cheminformatics Elsevier Melbourne University (ChEMU) evaluation lab 2020, part of the Conference and Labs of the Evaluation Forum 2020 (CLEF2020), provides a platform to advance the state-of-the-arts in automatic information extraction systems over chemical patents. In particular, we focus on extracting synthesis process of new chemical compounds from chemical patents. Using the ChEMU corpus of 1500 \snippets" (text segments) sampled from 170 patent documents and annotated by chemical experts, we de_ned two key information extraction tasks. Task 1 targets at chemical named entity recognition, i.e., the identi_cation of chemical compounds and their specific roles in chemical reactions. Task 2 targets at event extraction, i.e., the identification of reaction steps, relating the chemical compounds involved in a chemical reaction. In this paper, we provide an overview of our ChEMU2020 lab. Herein, we describe the resources created for the two tasks, the evaluation methodology adopted, and participants results. We also provide a brief summary of the methods employed by participants of this lab and the results obtained across 46 runs from 11 teams, finding that several submissions achieve substantially better results than the baseline methods prepared by the organizers.


Note: Due to the COVID-19 outbreak, the CLEF 2020 conference venue in Thessaloniki was cancelled. The proceedings of the online conference are however published according to the original schedule.


Keywords:
Conference Type:
full paper
Faculty:
Economie et Services
School:
HEG - Genève
Institute:
CRAG - Centre de Recherche Appliquée en Gestion
Subject(s):
Informatique
Publisher:
Thessaloniki, Greece, 22-25 September 2020
Date:
2020-09
Thessaloniki, Greece
22-25 September 2020
Pagination:
31 p.
Published in:
Proceedings of the CLEF 2020 conference
External resources:
Appears in Collection:



 Record created 2020-10-05, last modified 2020-10-28

Fulltext:
Download fulltext
PDF

Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)