论文标题
如何从全球角度比较分类器的对抗性鲁棒性
How to compare adversarial robustness of classifiers from a global perspective
论文作者
论文摘要
近年来,机器学习模型的对抗性鲁棒性吸引了很大的关注。对抗性攻击破坏了机器学习模型中对机器学习模型的可靠性和信任的可靠性,但是更强大的模型的构建取决于对对抗性鲁棒性作为给定模型的属性的严格理解。特定威胁模型的点措施是当前最受欢迎的工具,用于比较分类器的鲁棒性,并在最近的出版物中使用了对抗性鲁棒性。在这项工作中,我们使用最近提出的鲁棒性曲线来表明,明智的措施无法捕获重要的全球属性,这些属性对于可靠地比较不同分类器的鲁棒性至关重要。我们介绍了新的方式,在评估和比较训练有素的模型的鲁棒性时,鲁棒性曲线可用于系统地揭示这些属性,并为研究人员和从业人员提供具体的建议。此外,我们将刻度表征为区分大小扰动的一种方式,并将其与数据集的固有属性联系起来,证明必须相应地选择鲁棒性阈值。我们发布代码以重现本文中提出的所有实验,其中包括一个Python模块,以计算任意数据集和分类器的稳健性曲线,并支持许多框架,包括Tensorflow,Pytorch和Jax。
Adversarial robustness of machine learning models has attracted considerable attention over recent years. Adversarial attacks undermine the reliability of and trust in machine learning models, but the construction of more robust models hinges on a rigorous understanding of adversarial robustness as a property of a given model. Point-wise measures for specific threat models are currently the most popular tool for comparing the robustness of classifiers and are used in most recent publications on adversarial robustness. In this work, we use recently proposed robustness curves to show that point-wise measures fail to capture important global properties that are essential to reliably compare the robustness of different classifiers. We introduce new ways in which robustness curves can be used to systematically uncover these properties and provide concrete recommendations for researchers and practitioners when assessing and comparing the robustness of trained models. Furthermore, we characterize scale as a way to distinguish small and large perturbations, and relate it to inherent properties of data sets, demonstrating that robustness thresholds must be chosen accordingly. We release code to reproduce all experiments presented in this paper, which includes a Python module to calculate robustness curves for arbitrary data sets and classifiers, supporting a number of frameworks, including TensorFlow, PyTorch and JAX.