Historical Text Normalization with Delayed Rewards
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
Historical Text Normalization with Delayed Rewards. / Flachs, Simon; Bollmann, Marcel; Søgaard, Anders.
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2019. p. 1614-1619.Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Historical Text Normalization with Delayed Rewards
AU - Flachs, Simon
AU - Bollmann, Marcel
AU - Søgaard, Anders
PY - 2019
Y1 - 2019
N2 - Training neural sequence-to-sequence models with simple token-level log-likelihood is now a standard approach to historical text normalization, albeit often outperformed by phrase-based models. Policy gradient training enables direct optimization for exact matches, and while the small datasets in historical text normalization are prohibitive of from-scratch reinforcement learning, we show that policy gradient fine-tuning leads to significant improvements across the board. Policy gradient training, in particular, leads to more accurate normalizations for long or unseen words
AB - Training neural sequence-to-sequence models with simple token-level log-likelihood is now a standard approach to historical text normalization, albeit often outperformed by phrase-based models. Policy gradient training enables direct optimization for exact matches, and while the small datasets in historical text normalization are prohibitive of from-scratch reinforcement learning, we show that policy gradient fine-tuning leads to significant improvements across the board. Policy gradient training, in particular, leads to more accurate normalizations for long or unseen words
U2 - 10.18653/v1/P19-1157
DO - 10.18653/v1/P19-1157
M3 - Article in proceedings
SP - 1614
EP - 1619
BT - Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
PB - Association for Computational Linguistics
T2 - 57th Annual Meeting of the Association for Computational Linguistics
Y2 - 1 July 2019 through 1 July 2019
ER -
ID: 239617712