Evaluation-as-a-service for the computational sciences : overview and outlook

Hopfgartner, Frank (University of Sheffield, United Kingdom) ; Hanbury, Allan (TU Wien, Complexity Science Hub Vienna, Vienna, Austria) ; Müller, Henning (University of Applied Sciences and Arts Western Switzerland (HES-SO Valais-Wallis)) ; Eggel, Ivan (University of Applied Sciences and Arts Western Switzerland (HES-SO Valais-Wallis)) ; Krisztian, Balog (University of Stavanger, Stavanger, Norway) ; Brodt, Torben (plista GmbH, Berlin Germany) ; Cormack, Gordon V. (University of Waterloo, Waterloo, Canada) ; Lin, Jimmy (University of Waterloo, Waterloo, Canada) ; Kalpathy-Cramer, Jayashree (Athinoula A. Martinos Center for Biomedical Imaging at Massachusetts General Hospital and Harvard Medical School, Charlestown, MA USA) ; Kando, Noriko (National Institute of Informatics, Tokyo, Japan) ; Kato, Makoto P. (Kyoto University, Yoshida Honmachi, Sakyo, Kyoto, Japan) ; Krithara, Anastasia (National Center for Scientific Research "Demokritos", Paraskevi, Athens, Greece) ; Gollub, Tim (Bauhaus-Universität Weimar, Weimar, Germany) ; Potthast, Martin (Leipzig University, Leipzig, Germany) ; Viegas, Evelyne (Microsoft Research, Redmond, WA, USA) ; Mercer, Simon (Independent Consultant)

Evaluation in empirical computer science is essential to show progress and assess technologies developed. Several research domains such as information retrieval have long relied on systematic evaluation to measure progress: here, the Cranfield paradigm of creating shared test collections, defining search tasks, and collecting ground truth for these tasks has persisted up until now. In recent years, however, several new challenges have emerged that do not fit this paradigm very well: extremely large data sets, confidential data sets as found in the medical domain, and rapidly changing data sets as often encountered in industry. Also, crowdsourcing has changed the way that industry approaches problem-solving with companies now organizing challenges and handing out monetary awards to incentivize people to work on their challenges, particularly in the field of machine learning. This white paper is based on discussions at a workshop on Evaluation-as-a-Service (EaaS). EaaS is the paradigm of not providing data sets to participants and have them work on the data locally, but keeping the data central and allowing access via Application Programming Interfaces (API), Virtual Machines (VM) or other possibilities to ship executables. The objective of this white paper are to summarize and compare the current approaches and consolidate the experiences of these approaches to outline the next steps of EaaS, particularly towards sustainable research infrastructures. This white paper summarizes several existing approaches to EaaS and analyzes their usage scenarios and also the advantages and disadvantages. The many factors influencing EaaS are overviewed, and the environment in terms of motivations for the various stakeholders, from funding agencies to challenge organizers, researchers and participants, to industry interested in supplying real-world problems for which they require solutions.

Article Type:
Economie et Services
Institut Informatique de gestion
32 p.
Published in:
Journal of data and information quality
Numeration (vol. no.):
October 2018, vol. 10, no. 4, pp. 1-32
External resources:
Appears in Collection:

 Record created 2016-09-27, last modified 2020-10-27

Author postprint:
Download fulltext

Rate this document:

Rate this document:
(Not yet reviewed)