GPoeT : a language model trained for rhyme generation on synthetic data

Popescu-Belis, Andrei; Atrio, Alex R.; Bernath, Bastien; Boisson, Étienne; Ferrari, Teo; Theimer-Lienhardt, Xavier; Vernikos, Giorgos

Popescu-Belis, Andrei; Atrio, Alex R.; Bernath, Bastien; Boisson, Étienne; Ferrari, Teo; Theimer-Lienhardt, Xavier; Vernikos, Giorgos

2023

Télécharger

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Résumé

Poem generation with language models requires the modeling of rhyming patterns. We propose a novel solution for learning to rhyme, based on synthetic data generated with a rule-based rhyming algorithm. The algorithm and an evaluation metric use a phonetic dictionary and the definitions of perfect and assonant rhymes. We fine-tune a GPT-2 English model with 124M parameters on 142 MB of natural poems and find that this model generates consecutive rhymes infrequently (11%). We then fine-tune the model on 6 MB of synthetic quatrains with consecutive rhymes (AABB) and obtain nearly 60% of rhyming lines in samples generated by the model. Alternating rhymes (ABAB) are more difficult to model because of longer-range dependencies, but they are still learnable from synthetic data, reaching 45% of rhyming lines in generated samples.

Détails

Titre GPoeT : a language model trained for rhyme generation on synthetic data

Auteur(s)/ trice(s) Popescu-Belis, Andrei (School of Engineering and Management Vaud, HES-SO University of Applied Sciences and Arts Western Switzerland ; EPFL, Lausanne, Switzerland)
Atrio, Alex R. (School of Engineering and Management Vaud, HES-SO University of Applied Sciences and Arts Western Switzerland ; EPFL, Lausanne, Switzerland)
Bernath, Bastien (EPFL, Lausanne, Switzerland)
Boisson, Étienne (EPFL, Lausanne, Switzerland)
Ferrari, Teo (School of Engineering and Management Vaud, HES-SO University of Applied Sciences and Arts Western Switzerland)
Theimer-Lienhardt, Xavier (EPFL, Lausanne, Switzerland)
Vernikos, Giorgos (School of Engineering and Management Vaud, HES-SO University of Applied Sciences and Arts Western Switzerland ; EPFL, Lausanne, Switzerland)

Date 2023-05

Publié dans Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

Volume pp. 10-20

Editeur Association for Computational Linguistics

Pagination 11 p.

Présenté à Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, Dubrovnik, Croatia, 2023-05-06, 2023-05-06

Type de papier published full paper

Domaine Ingénierie et Architecture

Ecole HEIG-VD

Institut IICT - Institut des Technologies de l'Information et de la Communication

Le document apparaît dans Documents de conférences
Global

Ressource(s) externe(s) Online Publication

Résumé

Détails

Actions

PDF