Overview of the Transformer-based Models for NLP Tasks

Gillioz, Anthony (University of Neuchâtel, Neuchâtel, Switzerland) ; Casas, Jacky (School of Engineering and Architecture (HEIA-FR), HES-SO // University of Applied Sciences Western Switzerland) ; Mugellini, Elena (School of Engineering and Architecture (HEIA-FR), HES-SO // University of Applied Sciences Western Switzerland) ; Abou Khaled, Omar (School of Engineering and Architecture (HEIA-FR), HES-SO // University of Applied Sciences Western Switzerland)

In 2017, Vaswani et al. proposed a new neural network architecture named Transformer. That modern architecture quickly revolutionized the natural language processing world. Models like GPT and BERT relying on this Transformer architecture have fully outperformed the previous state-of-the-art networks. It surpassed the earlier approaches by such a wide margin that all the recent cutting edge models seem to rely on these Transformer-based architectures. In this paper, we provide an overview and explanations of the latest models. We cover the auto-regressive models such as GPT, GPT-2 and XLNET, as well as the auto-encoder architecture such as BERT and a lot of post-BERT models like RoBERTa, ALBERT, ERNIE 1.0/2.0.


Type de conférence:
published full paper
Faculté:
Ingénierie et Architecture
Ecole:
HEIA-FR
Institut:
HumanTech - Technology for Human Wellbeing Institute
Adresse bibliogr.:
Sofia, Bulgaria, 6-9 September 2020
Date:
2020-09
Sofia, Bulgaria
6-9 September 2020
Pagination:
5 p.
Publié dans:
Proceedings of the 2020 Federated Conference on Computer Science and Information Systems, 6-9 September 2020, Sofia, Bulgaria ; Annals of Computer Sciences and Information Sciences
Numérotation (vol. no.):
2020, vol. 21, pp. 179-183
DOI:
ISSN:
2300-5963
ISBN:
978-83-955416-7-4
Le document apparaît dans:



 Notice créée le 2021-01-12, modifiée le 2021-01-20

Fulltext:
Télécharger le document
PDF

Évaluer ce document:

Rate this document:
1
2
3
 
(Pas encore évalué)