The Hidden Folk - Resultat

The Hidden Folk: Linguistic Properties Encoded in Multilingual Contextual Character Representations

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Standard

The Hidden Folk : Linguistic Properties Encoded in Multilingual Contextual Character Representations. / Agirrezabal, Manex; Boldsen, Sidsel; Hollenstein, Nora.

Proceedings of the Workshop on Computation and Written Language (CAWL 2023). Toronto : Association for Computational Linguistics, 2023. s. 6-13.

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Harvard

Agirrezabal, M, Boldsen, S & Hollenstein, N 2023, The Hidden Folk: Linguistic Properties Encoded in Multilingual Contextual Character Representations. i Proceedings of the Workshop on Computation and Written Language (CAWL 2023). Association for Computational Linguistics, Toronto, s. 6-13. https://doi.org/10.18653/v1/2023.cawl-1.2

APA

Agirrezabal, M., Boldsen, S., & Hollenstein, N. (2023). The Hidden Folk: Linguistic Properties Encoded in Multilingual Contextual Character Representations. I Proceedings of the Workshop on Computation and Written Language (CAWL 2023) (s. 6-13). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.cawl-1.2

Vancouver

Agirrezabal M, Boldsen S, Hollenstein N. The Hidden Folk: Linguistic Properties Encoded in Multilingual Contextual Character Representations. I Proceedings of the Workshop on Computation and Written Language (CAWL 2023). Toronto: Association for Computational Linguistics. 2023. s. 6-13 https://doi.org/10.18653/v1/2023.cawl-1.2

Author

Agirrezabal, Manex ; Boldsen, Sidsel ; Hollenstein, Nora. / The Hidden Folk : Linguistic Properties Encoded in Multilingual Contextual Character Representations. Proceedings of the Workshop on Computation and Written Language (CAWL 2023). Toronto : Association for Computational Linguistics, 2023. s. 6-13

Bibtex

@inproceedings{d9eb7eb754234bcf93ac92ad68b1216a,

title = "The Hidden Folk: Linguistic Properties Encoded in Multilingual Contextual Character Representations",

abstract = "To gain a better understanding of the linguistic information encoded in character-based language models, we probe the multilingual contextual CANINE model. We design a range of phonetic probing tasks in six Nordic languages, including Faroese as an additional zero-shot instance. We observe that some phonetic information is indeed encoded in the character representations, as consonants and vowels can be well distinguished using a linear classifier. Furthermore, results for the Danish and Norwegian language seem to be worse for the consonant/vowel distinction in comparison to other languages. The information encoded in these representations can also be learned in a zero-shot scenario, as Faroese shows a reasonably good performance in the same vowel/consonant distinction task.",

author = "Manex Agirrezabal and Sidsel Boldsen and Nora Hollenstein",

year = "2023",

doi = "10.18653/v1/2023.cawl-1.2",

language = "English",

pages = "6--13",

booktitle = "Proceedings of the Workshop on Computation and Written Language (CAWL 2023)",

publisher = "Association for Computational Linguistics",

}

RIS

TY - GEN

T1 - The Hidden Folk

T2 - Linguistic Properties Encoded in Multilingual Contextual Character Representations

AU - Agirrezabal, Manex

AU - Boldsen, Sidsel

AU - Hollenstein, Nora

PY - 2023

Y1 - 2023

N2 - To gain a better understanding of the linguistic information encoded in character-based language models, we probe the multilingual contextual CANINE model. We design a range of phonetic probing tasks in six Nordic languages, including Faroese as an additional zero-shot instance. We observe that some phonetic information is indeed encoded in the character representations, as consonants and vowels can be well distinguished using a linear classifier. Furthermore, results for the Danish and Norwegian language seem to be worse for the consonant/vowel distinction in comparison to other languages. The information encoded in these representations can also be learned in a zero-shot scenario, as Faroese shows a reasonably good performance in the same vowel/consonant distinction task.

AB - To gain a better understanding of the linguistic information encoded in character-based language models, we probe the multilingual contextual CANINE model. We design a range of phonetic probing tasks in six Nordic languages, including Faroese as an additional zero-shot instance. We observe that some phonetic information is indeed encoded in the character representations, as consonants and vowels can be well distinguished using a linear classifier. Furthermore, results for the Danish and Norwegian language seem to be worse for the consonant/vowel distinction in comparison to other languages. The information encoded in these representations can also be learned in a zero-shot scenario, as Faroese shows a reasonably good performance in the same vowel/consonant distinction task.

U2 - 10.18653/v1/2023.cawl-1.2

DO - 10.18653/v1/2023.cawl-1.2

M3 - Article in proceedings

SP - 6

EP - 13

BT - Proceedings of the Workshop on Computation and Written Language (CAWL 2023)

PB - Association for Computational Linguistics

CY - Toronto

ER -

ID: 374969148