对算法公平的中毒攻击

论文标题

对算法公平的中毒攻击

Poisoning Attacks on Algorithmic Fairness

论文作者

Solans, David, Biggio, Battista, Castillo, Carlos

论文摘要

对抗机器学习中的研究表明，即使将一小部分中毒点注入训练数据，机器学习模型的性能如何受到严重损害。尽管已经广泛研究了这种中毒攻击对模型准确性的影响，但它们对其他模型性能指标的潜在影响仍有待评估。在这项工作中，我们引入了一个优化框架，以中毒对算法公平性的攻击，并开发了一种基于梯度的中毒攻击，旨在在数据中引入分类差异。我们从经验上表明，我们的攻击不仅在白色框设置中有效，在白色框设置中，攻击者可以完全访问目标模型，而且在更具挑战性的黑盒场景中，对替代模型进行了优化，然后转移到目标模型。我们认为，我们的发现铺平了朝着针对算法公平性在不同情况下的一组新颖的对抗性攻击的定义的道路，并且研究这种脆弱性将有助于设计更多强大的算法和对策。

Research in adversarial machine learning has shown how the performance of machine learning models can be seriously compromised by injecting even a small fraction of poisoning points into the training data. While the effects on model accuracy of such poisoning attacks have been widely studied, their potential effects on other model performance metrics remain to be evaluated. In this work, we introduce an optimization framework for poisoning attacks against algorithmic fairness, and develop a gradient-based poisoning attack aimed at introducing classification disparities among different groups in the data. We empirically show that our attack is effective not only in the white-box setting, in which the attacker has full access to the target model, but also in a more challenging black-box scenario in which the attacks are optimized against a substitute model and then transferred to the target model. We believe that our findings pave the way towards the definition of an entirely novel set of adversarial attacks targeting algorithmic fairness in different scenarios, and that investigating such vulnerabilities will help design more robust algorithms and countermeasures in the future.

下载PDF全文

下载文献需遵守相关版权规定

论文标题