Small batch sizes improve training of low-resource neural MT

Atrio, Alex R.; Popescu-Belis, Andrei

2021

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Résumé

We study the role of an essential hyperparameter that governs the training of Transformers for neural machine translation in a low-resource setting: the batch size. Using theoretical insights and experimental evidence, we argue against the widespread belief that batch size should be set as large as allowed by the memory of the GPUs. We show that in a low-resource setting, a smaller batch size leads to higher scores in a shorter training time, and argue that this is due to better regularization of the gradients during training.

Détails

Titre Small batch sizes improve training of low-resource neural MT

Auteur(s)/ trice(s) Atrio, Alex R. (School of Management and Engineering Vaud, HES-SO University of Applied Sciences Western Switzerland ; EPFL, Lausanne, Switzerland)
Popescu-Belis, Andrei (School of Management and Engineering Vaud, HES-SO University of Applied Sciences Western Switzerland ; EPFL, Lausanne, Switzerland)

Date 2021-12

Publié dans Proceedings of ICON 2021: 18th International Conference on Natural Language Processing

Editeur Assam, India, 16-19 December 2021

Pagination 7 p.

Présenté à ICON 2021: 18th International Conference on Natural Language Processing, Silchar, Assam, India, 2021-12-16, 2021-12-19

Type de papier published full paper

Domaine Ingénierie et Architecture

Ecole HEIG-VD

Institut IICT - Institut des Technologies de l'Information et de la Communication

Le document apparaît dans Documents de conférences
Global

Ressource(s) externe(s) Online version

Résumé

Détails

Actions

PDF