论文标题
Metapoison:实用通用清洁标签数据中毒
MetaPoison: Practical General-purpose Clean-label Data Poisoning
论文作者
论文摘要
数据中毒 - 攻击者通过对培训数据的子集进行不可察觉的更改来控制模型的过程 - 在神经网络的背景下是一个新兴的威胁。现有用于数据中毒神经网络的攻击依赖于手工制作的启发式方法,因为通常认为通过双层次来解决中毒问题通常被认为是对深模型的棘手。我们提出了Metapoison,这是一种一阶方法,可以通过愚弄神经网络的元学习和工艺毒物来近似二聚体问题。 Metapoison有效:它的表现要优于先前的清洁标签中毒方法。 Metapoison强大:用于将一种模型转移到具有未知训练环境和体系结构的各种受害者模型的中毒数据。 Metapoison是通用的,它不仅在微调场景中起作用,而且在从头开始的端到端培训,直到现在到现在为止,对于带有深网的清洁标签攻击并不可行。 Metapoison可以实现任意的对手目标 - 例如使用一个类的毒物制作目标图像的毒物,戴上另一个任意选择类的标签。最后,Metapoison在现实世界中工作。我们首次证明了在Black-Box Google Cloud Automl API上训练的模型的成功数据中毒。代码和预制毒药可在https://github.com/wronnyhuang/metapoison上提供
Data poisoning -- the process by which an attacker takes control of a model by making imperceptible changes to a subset of the training data -- is an emerging threat in the context of neural networks. Existing attacks for data poisoning neural networks have relied on hand-crafted heuristics, because solving the poisoning problem directly via bilevel optimization is generally thought of as intractable for deep models. We propose MetaPoison, a first-order method that approximates the bilevel problem via meta-learning and crafts poisons that fool neural networks. MetaPoison is effective: it outperforms previous clean-label poisoning methods by a large margin. MetaPoison is robust: poisoned data made for one model transfer to a variety of victim models with unknown training settings and architectures. MetaPoison is general-purpose, it works not only in fine-tuning scenarios, but also for end-to-end training from scratch, which till now hasn't been feasible for clean-label attacks with deep nets. MetaPoison can achieve arbitrary adversary goals -- like using poisons of one class to make a target image don the label of another arbitrarily chosen class. Finally, MetaPoison works in the real-world. We demonstrate for the first time successful data poisoning of models trained on the black-box Google Cloud AutoML API. Code and premade poisons are provided at https://github.com/wronnyhuang/metapoison