论文标题
理解模型提取攻击和防御的框架
A Framework for Understanding Model Extraction Attack and Defense
论文作者
论文摘要
机器学习模型的隐私已成为许多新兴的机器学习应用程序中的重要问题,在这些应用程序中,基于训练有素的模型的预测服务通过按要求提供给用户。缺乏防御机制可以对服务器模型的隐私施加高风险,因为对手可以通过仅查询几个“好”数据点来有效地窃取模型。服务器的防御与对手的攻击之间的相互作用不可避免地导致了武器竞赛的困境,这在对抗机器学习中通常可以看出。从良性用户的角度来研究模型效用之间的基本权衡,从对手的角度来看,我们开发了新的指标来量化此类权衡,分析其理论属性并开发优化问题,以了解最佳的对抗性攻击和防御策略。发达的概念和理论与隐私与效用之间“平衡”的经验发现相匹配。在优化方面,启用我们的结果的关键要素是对攻击防御问题的统一表示为Min-Max Bi级问题。开发的结果将通过示例和实验来证明。
The privacy of machine learning models has become a significant concern in many emerging Machine-Learning-as-a-Service applications, where prediction services based on well-trained models are offered to users via pay-per-query. The lack of a defense mechanism can impose a high risk on the privacy of the server's model since an adversary could efficiently steal the model by querying only a few `good' data points. The interplay between a server's defense and an adversary's attack inevitably leads to an arms race dilemma, as commonly seen in Adversarial Machine Learning. To study the fundamental tradeoffs between model utility from a benign user's view and privacy from an adversary's view, we develop new metrics to quantify such tradeoffs, analyze their theoretical properties, and develop an optimization problem to understand the optimal adversarial attack and defense strategies. The developed concepts and theory match the empirical findings on the `equilibrium' between privacy and utility. In terms of optimization, the key ingredient that enables our results is a unified representation of the attack-defense problem as a min-max bi-level problem. The developed results will be demonstrated by examples and experiments.