论文标题
不要忘记代词:消除语言模型中的性别偏见而不会丢失事实性别信息
Don't Forget About Pronouns: Removing Gender Bias in Language Models Without Losing Factual Gender Information
论文作者
论文摘要
大语言模型中的表示形式包含多种类型的性别信息。我们专注于英语文本中两种此类信号:事实性别信息,这是语法或语义属性,以及性别偏见,这是单词和特定性别之间的相关性。我们可以解开模型的嵌入,并识别编码两种类型信息的组件。我们的目标是减少表示形式中的刻板印象偏见,同时保留事实性别信号。我们的过滤方法表明,可以减少性别中立职业名称的偏见,而不会严重恶化能力。这些发现可以应用于语言生成,以减轻对刻板印象的依赖,同时保留性别协议。
The representations in large language models contain multiple types of gender information. We focus on two types of such signals in English texts: factual gender information, which is a grammatical or semantic property, and gender bias, which is the correlation between a word and specific gender. We can disentangle the model's embeddings and identify components encoding both types of information with probing. We aim to diminish the stereotypical bias in the representations while preserving the factual gender signal. Our filtering method shows that it is possible to decrease the bias of gender-neutral profession names without significant deterioration of language modeling capabilities. The findings can be applied to language generation to mitigate reliance on stereotypes while preserving gender agreement in coreferences.