Type B Reflexivization as an Unambiguous Testbed for Multilingual Multi-Task Gender Bias
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
Type B Reflexivization as an Unambiguous Testbed for Multilingual Multi-Task Gender Bias. / Gonzalez, Ana Valeria; Barrett, Maria Jung; Hvingelby, Rasmus; Søgaard, Anders; Webster, Kellie.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2020. p. 2637–2648.Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Type B Reflexivization as an Unambiguous Testbed for Multilingual Multi-Task Gender Bias
AU - Gonzalez, Ana Valeria
AU - Barrett, Maria Jung
AU - Hvingelby, Rasmus
AU - Søgaard, Anders
AU - Webster, Kellie
PY - 2020
Y1 - 2020
N2 - The one-sided focus on English in previous studies of gender bias in NLP misses out on opportunities in other languages: English challenge datasets such as GAP and WinoGender highlight model preferences that are “hallucinatory”, e.g., disambiguating gender-ambiguous occurrences of ‘doctor’ as male doctors. We show that for languages with type B reflexivization, e.g., Swedish and Russian, we can construct multi-task challenge datasets for detecting gender bias that lead to unambiguously wrong model predictions: In these languages, the direct translation of ‘the doctor removed his mask’ is not ambiguous between a coreferential reading and a disjoint reading. Instead, the coreferential reading requires a non-gendered pronoun, and the gendered, possessive pronouns are anti-reflexive. We present a multilingual, multi-task challenge dataset, which spans four languages and four NLP tasks and focuses only on this phenomenon. We find evidence for gender bias across all task-language combinations and correlate model bias with national labor market statistics.
AB - The one-sided focus on English in previous studies of gender bias in NLP misses out on opportunities in other languages: English challenge datasets such as GAP and WinoGender highlight model preferences that are “hallucinatory”, e.g., disambiguating gender-ambiguous occurrences of ‘doctor’ as male doctors. We show that for languages with type B reflexivization, e.g., Swedish and Russian, we can construct multi-task challenge datasets for detecting gender bias that lead to unambiguously wrong model predictions: In these languages, the direct translation of ‘the doctor removed his mask’ is not ambiguous between a coreferential reading and a disjoint reading. Instead, the coreferential reading requires a non-gendered pronoun, and the gendered, possessive pronouns are anti-reflexive. We present a multilingual, multi-task challenge dataset, which spans four languages and four NLP tasks and focuses only on this phenomenon. We find evidence for gender bias across all task-language combinations and correlate model bias with national labor market statistics.
U2 - 10.18653/v1/2020.emnlp-main.209
DO - 10.18653/v1/2020.emnlp-main.209
M3 - Article in proceedings
SP - 2637
EP - 2648
BT - Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
PB - Association for Computational Linguistics
T2 - The 2020 Conference on Empirical Methods in Natural Language Processing
Y2 - 16 November 2020 through 20 November 2020
ER -
ID: 258399669