对撞机：可用于后门数据的强大培训框架

论文标题

对撞机：可用于后门数据的强大培训框架

COLLIDER: A Robust Training Framework for Backdoor Data

论文作者

Dolatabadi, Hadi M., Erfani, Sarah, Leckie, Christopher

论文摘要

深神经网络（DNN）分类器容易受到后门攻击的影响。对手通过安装触发因素在此类攻击中毒物中的一些训练数据。目的是使训练有素的DNN输出攻击者所需的类，每当触发器像往常一样进行清洁数据时激活触发器。最近，已经提出了各种方法来检测恶意的后门DNN。但是，对于后门毒数据，尚待发现一种强大的端到端训练方法，例如对抗性训练。在本文中，我们通过开发强大的训练框架（Collider）迈出了第一步，该框架通过利用数据的基本几何结构来选择最突出的样本。具体而言，我们通过解决几何核心选择目标有效地在每个训练时期有效地滤除了候选数据。我们首先争论清洁数据样品如何显示（1）梯度与大多数数据相似，以及（2）局部固有维度低（LID）。基于这些标准，我们定义了一个新的核心选择目标，以找到用于训练DNN的样品。我们显示了提出的方法在各种毒数据集上对DNN进行强大训练的有效性，从而大大降低了后门的成功率。

Deep neural network (DNN) classifiers are vulnerable to backdoor attacks. An adversary poisons some of the training data in such attacks by installing a trigger. The goal is to make the trained DNN output the attacker's desired class whenever the trigger is activated while performing as usual for clean data. Various approaches have recently been proposed to detect malicious backdoored DNNs. However, a robust, end-to-end training approach, like adversarial training, is yet to be discovered for backdoor poisoned data. In this paper, we take the first step toward such methods by developing a robust training framework, COLLIDER, that selects the most prominent samples by exploiting the underlying geometric structures of the data. Specifically, we effectively filter out candidate poisoned data at each training epoch by solving a geometrical coreset selection objective. We first argue how clean data samples exhibit (1) gradients similar to the clean majority of data and (2) low local intrinsic dimensionality (LID). Based on these criteria, we define a novel coreset selection objective to find such samples, which are used for training a DNN. We show the effectiveness of the proposed method for robust training of DNNs on various poisoned datasets, reducing the backdoor success rate significantly.

下载PDF全文

下载文献需遵守相关版权规定

论文标题