Quantifying gender bias towards politicians in cross-lingual language models
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
Quantifying gender bias towards politicians in cross-lingual language models. / Stańczak, Karolina; Choudhury, Sagnik Ray; Pimentel, Tiago; Cotterell, Ryan; Augenstein, Isabelle.
In: PLoS ONE, Vol. 18, No. 11 November, e0277640, 2023, p. 1-24.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - Quantifying gender bias towards politicians in cross-lingual language models
AU - Stańczak, Karolina
AU - Choudhury, Sagnik Ray
AU - Pimentel, Tiago
AU - Cotterell, Ryan
AU - Augenstein, Isabelle
N1 - Publisher Copyright: © 2023 Stańczak et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2023
Y1 - 2023
N2 - Recent research has demonstrated that large pre-trained language models reflect societal biases expressed in natural language. The present paper introduces a simple method for probing language models to conduct a multilingual study of gender bias towards politicians. We quantify the usage of adjectives and verbs generated by language models surrounding the names of politicians as a function of their gender. To this end, we curate a dataset of 250k politicians worldwide, including their names and gender. Our study is conducted in seven languages across six different language modeling architectures. The results demonstrate that pre-trained language models’ stance towards politicians varies strongly across analyzed languages. We find that while some words such as dead, and designated are associated with both male and female politicians, a few specific words such as beautiful and divorced are predominantly associated with female politicians. Finally, and contrary to previous findings, our study suggests that larger language models do not tend to be significantly more gender-biased than smaller ones.
AB - Recent research has demonstrated that large pre-trained language models reflect societal biases expressed in natural language. The present paper introduces a simple method for probing language models to conduct a multilingual study of gender bias towards politicians. We quantify the usage of adjectives and verbs generated by language models surrounding the names of politicians as a function of their gender. To this end, we curate a dataset of 250k politicians worldwide, including their names and gender. Our study is conducted in seven languages across six different language modeling architectures. The results demonstrate that pre-trained language models’ stance towards politicians varies strongly across analyzed languages. We find that while some words such as dead, and designated are associated with both male and female politicians, a few specific words such as beautiful and divorced are predominantly associated with female politicians. Finally, and contrary to previous findings, our study suggests that larger language models do not tend to be significantly more gender-biased than smaller ones.
U2 - 10.1371/journal.pone.0277640
DO - 10.1371/journal.pone.0277640
M3 - Journal article
C2 - 38015835
AN - SCOPUS:85178494544
VL - 18
SP - 1
EP - 24
JO - PLoS ONE
JF - PLoS ONE
SN - 1932-6203
IS - 11 November
M1 - e0277640
ER -
ID: 377801135