论文标题
FG-UAP:特征收集通用的对抗扰动
FG-UAP: Feature-Gathering Universal Adversarial Perturbation
论文作者
论文摘要
深层神经网络(DNN)容易受到精心设计的扰动的影响,无论是依赖还是独立于图像。后一个称为通用对抗扰动(UAP),对于模型鲁棒性分析非常有吸引力,因为其对输入的独立性揭示了模型的内在特征。相对而言,另一个有趣的观察结果是神经塌陷(NC),这意味着特征变异性可能在训练的末端阶段崩溃。在此激励的情况下,我们建议通过攻击发生NC现象发生的层来生成UAP。由于NC,拟议的攻击可能会收集其周围所有自然图像的特征,因此称为特征收集UAP(FG-UAP)。 我们评估了我们提出的算法对丰富实验的有效性,包括未靶向和有针对性的通用攻击,有限的数据集中的攻击以及基于转移的黑盒攻击,包括视觉变形金刚,据信更强大的视觉变压器。此外,我们通过分析标签和提取的对抗性示例的标签和提取特征,研究了fg-uap,发现模型损坏后崩溃现象变得更强大。接受纸张时,该代码将发布。
Deep Neural Networks (DNNs) are susceptible to elaborately designed perturbations, whether such perturbations are dependent or independent of images. The latter one, called Universal Adversarial Perturbation (UAP), is very attractive for model robustness analysis, since its independence of input reveals the intrinsic characteristics of the model. Relatively, another interesting observation is Neural Collapse (NC), which means the feature variability may collapse during the terminal phase of training. Motivated by this, we propose to generate UAP by attacking the layer where NC phenomenon happens. Because of NC, the proposed attack could gather all the natural images' features to its surrounding, which is hence called Feature-Gathering UAP (FG-UAP). We evaluate the effectiveness our proposed algorithm on abundant experiments, including untargeted and targeted universal attacks, attacks under limited dataset, and transfer-based black-box attacks among different architectures including Vision Transformers, which are believed to be more robust. Furthermore, we investigate FG-UAP in the view of NC by analyzing the labels and extracted features of adversarial examples, finding that collapse phenomenon becomes stronger after the model is corrupted. The code will be released when the paper is accepted.