论文标题

辛普森在推荐公平方面的悖论:调和人与汇总评估之间的差异

Simpson's Paradox in Recommender Fairness: Reconciling differences between per-user and aggregated evaluations

论文作者

Prost, Flavien, Packer, Ben, Chen, Jilin, Wei, Li, Kremp, Pierre, Blumm, Nicholas, Wang, Susan, Doshi, Tulsee, Osadebe, Tonia, Heldt, Lukasz, Chi, Ed H., Beutel, Alex

论文摘要

近年来,人们对排名和推荐系统的公平性概念进行了大量研究,特别是在如何评估推荐者是否在相关项目组中平均分配暴露(也称为提供商公平)。尽管这项研究奠定了重要的基础,但根据比较相关项目的每个用户/每次汇集还是在用户之间进行汇总,它产生了不同的方法。尽管既建立又直观,但我们发现这两个概念可以得出相反的结论,这是辛普森悖论的一种形式。我们调和这些概念,并表明张力是由于项目相关的用户分布的差异,并分解了用户建议的重要因素。基于这种新的理解,从业者可能对任何一种概念都感兴趣,但是由于相关性和用户满意度的部分可观察到,在现实世界中的推荐人中,可能会面临每个用户指标的挑战。我们描述了一种基于分布匹配的技术,以在这种情况下估算它。我们在模拟和现实的建议数据上证明了这种方法的有效性和实用性。

There has been a flurry of research in recent years on notions of fairness in ranking and recommender systems, particularly on how to evaluate if a recommender allocates exposure equally across groups of relevant items (also known as provider fairness). While this research has laid an important foundation, it gave rise to different approaches depending on whether relevant items are compared per-user/per-query or aggregated across users. Despite both being established and intuitive, we discover that these two notions can lead to opposite conclusions, a form of Simpson's Paradox. We reconcile these notions and show that the tension is due to differences in distributions of users where items are relevant, and break down the important factors of the user's recommendations. Based on this new understanding, practitioners might be interested in either notions, but might face challenges with the per-user metric due to partial observability of the relevance and user satisfaction, typical in real-world recommenders. We describe a technique based on distribution matching to estimate it in such a scenario. We demonstrate on simulated and real-world recommender data the effectiveness and usefulness of such an approach.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源