A simplified training pipeline for low-resource and unsupervised machine translation

Atrio, Alex R.; Allemann, Alexis; Dolamic, Ljiljana; Popescu-Belis, Andrei

Atrio, Alex R.; Allemann, Alexis; Dolamic, Ljiljana; Popescu-Belis, Andrei

2023

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Résumé

Training neural MT systems for low-resource language pairs or in unsupervised settings (i.e.{ with no parallel data) often involves a large number of auxiliary systems. These may include parent systems trained on higher-resource pairs and used for initializing the parameters of child systems, multilingual systems for neighboring languages, and several stages of systems trained on pseudo-parallel data obtained through back-translation. We propose here a simplified pipeline, which we compare to the best submissions to the WMT 2021 Shared Task on Unsupervised MT and Very Low Resource Supervised MT. Our pipeline only needs two parents, two children, one round of back-translation for low-resource directions and two for unsupervised ones and obtains better or similar scores when compared to more complex alternatives.

Détails

Titre A simplified training pipeline for low-resource and unsupervised machine translation

Auteur(s)/ trice(s) Atrio, Alex R. (School of Engineering and Management Vaud, HES-SO University of Applied Sciences and Arts Western Switzerland ; EPFL, Lausanne, Switzerland)
Allemann, Alexis (School of Engineering and Management Vaud, HES-SO University of Applied Sciences and Arts Western Switzerland)
Dolamic, Ljiljana (Armasuisse, W+T, Thun, Switzerland)
Popescu-Belis, Andrei (School of Engineering and Management Vaud, HES-SO University of Applied Sciences and Arts Western Switzerland ; EPFL, Lausanne, Switzerland)

Date 2023-05

Publié dans Proceedings of the 6th Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2023)

Volume pp. 47-58

Editeur Association for Computational Linguistics

Pagination 12 p.

Présenté à Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT), Dubrovnik, Croatia, 2023-05-06, 2023-05-06

Type de papier published full paper

Domaine Ingénierie et Architecture

Ecole HEIG-VD

Institut IICT - Institut des Technologies de l'Information et de la Communication

Le document apparaît dans Documents de conférences
Global

Ressource(s) externe(s) Online publication

Résumé

Détails

Actions

PDF