论文标题
使用模型解释研究图像分类中的偏差
Investigating Bias in Image Classification using Model Explanations
论文作者
论文摘要
我们评估了模型解释是否可以通过突出歧视特征来有效地检测图像分类中的偏差,从而消除了对公平计算的敏感属性的依赖。为此,我们为偏见检测制定了重要特征,并观察到随着模型中偏见程度的变化而发生的解释如何变化。本文确定了使用解释来检测偏见的优势和最佳实践,以及三个主要弱点:解释很差估计偏见的程度,可能会在分析中引入更多的偏见,有时在涉及的人类努力方面效率低下。
We evaluated whether model explanations could efficiently detect bias in image classification by highlighting discriminating features, thereby removing the reliance on sensitive attributes for fairness calculations. To this end, we formulated important characteristics for bias detection and observed how explanations change as the degree of bias in models change. The paper identifies strengths and best practices for detecting bias using explanations, as well as three main weaknesses: explanations poorly estimate the degree of bias, could potentially introduce additional bias into the analysis, and are sometimes inefficient in terms of human effort involved.