Overview of the Transformer-based Models for NLP Tasks

Gillioz, Anthony (University of Neuchâtel, Neuchâtel, Switzerland) ; Casas, Jacky (School of Engineering and Architecture (HEIA-FR), HES-SO // University of Applied Sciences Western Switzerland) ; Mugellini, Elena (School of Engineering and Architecture (HEIA-FR), HES-SO // University of Applied Sciences Western Switzerland) ; Abou Khaled, Omar (School of Engineering and Architecture (HEIA-FR), HES-SO // University of Applied Sciences Western Switzerland)

In 2017, Vaswani et al. proposed a new neural network architecture named Transformer. That modern architecture quickly revolutionized the natural language processing world. Models like GPT and BERT relying on this Transformer architecture have fully outperformed the previous state-of-the-art networks. It surpassed the earlier approaches by such a wide margin that all the recent cutting edge models seem to rely on these Transformer-based architectures. In this paper, we provide an overview and explanations of the latest models. We cover the auto-regressive models such as GPT, GPT-2 and XLNET, as well as the auto-encoder architecture such as BERT and a lot of post-BERT models like RoBERTa, ALBERT, ERNIE 1.0/2.0.


Conference Type:
published full paper
Faculty:
Ingénierie et Architecture
School:
HEIA-FR
Institute:
HumanTech - Technology for Human Wellbeing Institute
Publisher:
Sofia, Bulgaria, 6-9 September 2020
Date:
2020-09
Sofia, Bulgaria
6-9 September 2020
Pagination:
5 p.
Published in:
Proceedings of the 2020 Federated Conference on Computer Science and Information Systems, 6-9 September 2020, Sofia, Bulgaria ; Annals of Computer Sciences and Information Sciences
Numeration (vol. no.):
2020, vol. 21, pp. 179-183
DOI:
ISSN:
2300-5963
ISBN:
978-83-955416-7-4
Appears in Collection:



 Record created 2021-01-12, last modified 2021-01-20

Fulltext:
Download fulltext
PDF

Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)