Standard
Development and Evaluation of Pre-trained Language Models for Historical Danish and Norwegian Literary Texts. / Al-Laith, Ali; Conroy, Alexander; Bjerring-Hansen, Jens; Hershcovich, Daniel.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). ed. / Nicoletta Calzolari; Min-Yen Kan; Veronique Hoste; Alessandro Lenci; Sakriani Sakti; Nianwen Xue. European Language Resources Association (ELRA), 2024. p. 4811-4819.
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
Al-Laith, A, Conroy, A, Bjerring-Hansen, J & Hershcovich, D 2024,
Development and Evaluation of Pre-trained Language Models for Historical Danish and Norwegian Literary Texts. in N Calzolari, M-Y Kan, V Hoste, A Lenci, S Sakti & N Xue (eds),
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). European Language Resources Association (ELRA), pp. 4811-4819, Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024, Hybrid, Torino, Italy,
20/05/2024. <
https://aclanthology.org/2024.lrec-main.431>
APA
Al-Laith, A., Conroy, A., Bjerring-Hansen, J., & Hershcovich, D. (2024).
Development and Evaluation of Pre-trained Language Models for Historical Danish and Norwegian Literary Texts. In N. Calzolari, M-Y. Kan, V. Hoste, A. Lenci, S. Sakti, & N. Xue (Eds.),
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 4811-4819). European Language Resources Association (ELRA).
https://aclanthology.org/2024.lrec-main.431
Vancouver
Al-Laith A, Conroy A, Bjerring-Hansen J, Hershcovich D. Development and Evaluation of Pre-trained Language Models for Historical Danish and Norwegian Literary Texts. In Calzolari N, Kan M-Y, Hoste V, Lenci A, Sakti S, Xue N, editors, Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). European Language Resources Association (ELRA). 2024. p. 4811-4819
Author
Al-Laith, Ali ; Conroy, Alexander ; Bjerring-Hansen, Jens ; Hershcovich, Daniel. / Development and Evaluation of Pre-trained Language Models for Historical Danish and Norwegian Literary Texts. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). editor / Nicoletta Calzolari ; Min-Yen Kan ; Veronique Hoste ; Alessandro Lenci ; Sakriani Sakti ; Nianwen Xue. European Language Resources Association (ELRA), 2024. pp. 4811-4819
Bibtex
@inproceedings{1855d62cfdaf44838628d7d0f35020f5,
title = "Development and Evaluation of Pre-trained Language Models for Historical Danish and Norwegian Literary Texts",
abstract = "We develop and evaluate the first pre-trained language models specifically tailored for historical Danish and Norwegian texts. Three models are trained on a corpus of 19th-century Danish and Norwegian literature: two directly on the corpus with no prior pre-training, and one with continued pre-training. To evaluate the models, we utilize an existing sentiment classification dataset, and additionally introduce a new annotated word sense disambiguation dataset focusing on the concept of fate. Our assessment reveals that the model employing continued pre-training outperforms the others in two downstream NLP tasks on historical texts. Specifically, we observe substantial improvement in sentiment classification and word sense disambiguation compared to models trained on contemporary texts. These results highlight the effectiveness of continued pre-training for enhancing performance across various NLP tasks in historical text analysis.",
keywords = "Digital Humanities, Pre-trained Language Models, Sentiment Analysis, Word Sense Disambiguation",
author = "Ali Al-Laith and Alexander Conroy and Jens Bjerring-Hansen and Daniel Hershcovich",
note = "Publisher Copyright: {\textcopyright} 2024 ELRA Language Resource Association: CC BY-NC 4.0.; Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024 ; Conference date: 20-05-2024 Through 25-05-2024",
year = "2024",
language = "English",
pages = "4811--4819",
editor = "Nicoletta Calzolari and Min-Yen Kan and Veronique Hoste and Alessandro Lenci and Sakriani Sakti and Nianwen Xue",
booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
publisher = "European Language Resources Association (ELRA)",
}
RIS
TY - GEN
T1 - Development and Evaluation of Pre-trained Language Models for Historical Danish and Norwegian Literary Texts
AU - Al-Laith, Ali
AU - Conroy, Alexander
AU - Bjerring-Hansen, Jens
AU - Hershcovich, Daniel
N1 - Publisher Copyright:
© 2024 ELRA Language Resource Association: CC BY-NC 4.0.
PY - 2024
Y1 - 2024
N2 - We develop and evaluate the first pre-trained language models specifically tailored for historical Danish and Norwegian texts. Three models are trained on a corpus of 19th-century Danish and Norwegian literature: two directly on the corpus with no prior pre-training, and one with continued pre-training. To evaluate the models, we utilize an existing sentiment classification dataset, and additionally introduce a new annotated word sense disambiguation dataset focusing on the concept of fate. Our assessment reveals that the model employing continued pre-training outperforms the others in two downstream NLP tasks on historical texts. Specifically, we observe substantial improvement in sentiment classification and word sense disambiguation compared to models trained on contemporary texts. These results highlight the effectiveness of continued pre-training for enhancing performance across various NLP tasks in historical text analysis.
AB - We develop and evaluate the first pre-trained language models specifically tailored for historical Danish and Norwegian texts. Three models are trained on a corpus of 19th-century Danish and Norwegian literature: two directly on the corpus with no prior pre-training, and one with continued pre-training. To evaluate the models, we utilize an existing sentiment classification dataset, and additionally introduce a new annotated word sense disambiguation dataset focusing on the concept of fate. Our assessment reveals that the model employing continued pre-training outperforms the others in two downstream NLP tasks on historical texts. Specifically, we observe substantial improvement in sentiment classification and word sense disambiguation compared to models trained on contemporary texts. These results highlight the effectiveness of continued pre-training for enhancing performance across various NLP tasks in historical text analysis.
KW - Digital Humanities
KW - Pre-trained Language Models
KW - Sentiment Analysis
KW - Word Sense Disambiguation
M3 - Article in proceedings
AN - SCOPUS:85195912870
SP - 4811
EP - 4819
BT - Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
A2 - Calzolari, Nicoletta
A2 - Kan, Min-Yen
A2 - Hoste, Veronique
A2 - Lenci, Alessandro
A2 - Sakti, Sakriani
A2 - Xue, Nianwen
PB - European Language Resources Association (ELRA)
T2 - Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024
Y2 - 20 May 2024 through 25 May 2024
ER -