论文标题
射线:硬标签对抗攻击的射线搜索方法
RayS: A Ray Searching Method for Hard-label Adversarial Attack
论文作者
论文摘要
深度神经网络容易受到对抗攻击的影响。在不同的攻击环境中,最具挑战性的最实际的是硬标签设置,在该设置中,攻击者只能访问目标模型的硬标签输出(预测标签)。先前的尝试在攻击成功率方面既没有足够的有效性,也没有足够有效的效率,而在广泛使用的$ l_ \ infty $ norm威胁模型下的查询复杂性方面。在本文中,我们介绍了射线搜索攻击(射线),该攻击极大地提高了硬标签的攻击效率和效率。与以前的作品不同,我们重新制定了将最接近的决策边界找到不需要任何零级梯度估计的离散问题的连续问题。同时,通过快速检查步骤消除了所有不必要的搜索。这大大减少了我们硬标签攻击所需的查询数量。此外,有趣的是,我们发现拟议的射线攻击也可以用作理智检查,以实现可能的“虚假稳定”模型。在最近提议的防御措施中,声称要实现最先进的准确性,我们的攻击方法表明,当前的白色框/黑色盒子攻击仍然可以给人一种错误的安全感,并且在最受欢迎的PGD攻击和射线攻击之间的稳健精度可能会达到28美元\%$。我们认为,我们提出的射线攻击可以帮助识别出击败大多数白色盒/黑色框攻击的错误强大模型。
Deep neural networks are vulnerable to adversarial attacks. Among different attack settings, the most challenging yet the most practical one is the hard-label setting where the attacker only has access to the hard-label output (prediction label) of the target model. Previous attempts are neither effective enough in terms of attack success rate nor efficient enough in terms of query complexity under the widely used $L_\infty$ norm threat model. In this paper, we present the Ray Searching attack (RayS), which greatly improves the hard-label attack effectiveness as well as efficiency. Unlike previous works, we reformulate the continuous problem of finding the closest decision boundary into a discrete problem that does not require any zeroth-order gradient estimation. In the meantime, all unnecessary searches are eliminated via a fast check step. This significantly reduces the number of queries needed for our hard-label attack. Moreover, interestingly, we found that the proposed RayS attack can also be used as a sanity check for possible "falsely robust" models. On several recently proposed defenses that claim to achieve the state-of-the-art robust accuracy, our attack method demonstrates that the current white-box/black-box attacks could still give a false sense of security and the robust accuracy drop between the most popular PGD attack and RayS attack could be as large as $28\%$. We believe that our proposed RayS attack could help identify falsely robust models that beat most white-box/black-box attacks.