Naive regularizers for low-resource neural machine translation
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
Naive regularizers for low-resource neural machine translation. / Beloucif, Meriem; Gonzalez, Ana Valeria; Bollmann, Marcel; Søgaard, Anders.
International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings. ed. / Galia Angelova; Ruslan Mitkov; Ivelina Nikolova; Irina Temnikova; Irina Temnikova. Incoma Ltd, 2019. p. 102-111.Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Naive regularizers for low-resource neural machine translation
AU - Beloucif, Meriem
AU - Gonzalez, Ana Valeria
AU - Bollmann, Marcel
AU - Søgaard, Anders
PY - 2019
Y1 - 2019
N2 - Neural machine translation models have little inductive bias, which can be a disadvantage in low-resource scenarios. They require large volumes of data and often perform poorly when limited data is available. We show that using naive regularization methods, based on sentence length, punctuation and word frequencies, to penalize translations that are very different from the input sentences, consistently improves the translation quality across multiple low-resource languages. We experiment with 12 language pairs, varying the training data size between 17k to 230k sentence pairs. Our best regularizer achieves an average increase of 1.5 BLEU score and 1.0 TER score across all the language pairs. For example, we achieve a BLEU score of 26.70 on the IWSLT15 English-Vietnamese translation task simply by using relative differences in punctuation as a regularizer.
AB - Neural machine translation models have little inductive bias, which can be a disadvantage in low-resource scenarios. They require large volumes of data and often perform poorly when limited data is available. We show that using naive regularization methods, based on sentence length, punctuation and word frequencies, to penalize translations that are very different from the input sentences, consistently improves the translation quality across multiple low-resource languages. We experiment with 12 language pairs, varying the training data size between 17k to 230k sentence pairs. Our best regularizer achieves an average increase of 1.5 BLEU score and 1.0 TER score across all the language pairs. For example, we achieve a BLEU score of 26.70 on the IWSLT15 English-Vietnamese translation task simply by using relative differences in punctuation as a regularizer.
UR - http://www.scopus.com/inward/record.url?scp=85076491475&partnerID=8YFLogxK
U2 - 10.26615/978-954-452-056-4_013
DO - 10.26615/978-954-452-056-4_013
M3 - Article in proceedings
AN - SCOPUS:85076491475
SP - 102
EP - 111
BT - International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings
A2 - Angelova, Galia
A2 - Mitkov, Ruslan
A2 - Nikolova, Ivelina
A2 - Temnikova, Irina
A2 - Temnikova, Irina
PB - Incoma Ltd
T2 - 12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019
Y2 - 2 September 2019 through 4 September 2019
ER -
ID: 237806742