论文标题
黑盒恶意软件分类器的最佳及地对抗近似
Best-Effort Adversarial Approximation of Black-Box Malware Classifiers
论文作者
论文摘要
旨在窃取黑框模型的对手通过预测API反复查询该模型,以学习近似其决策边界的函数。对抗性近似是非平凡的,因为模型体系结构,参数和要探索的特征的巨大组合。在这种情况下,对手求助于最佳效力策略,该策略产生了最接近的近似值。本文探讨了在最具挑战性的环境中黑盒恶意软件分类器的最佳对抗性近似,在这种情况下,对手的知识仅限于给定输入的预测标签。从黑盒分类器的输入集有限开始,我们利用功能表示映射和跨域的可传递性来通过本地培训替代品来近似黑盒恶意软件分类器。我们的方法将目标模型与目标模型和替代模型的不同特征类型近似,同时还使用非重叠数据来训练目标,训练替代品以及两者的比较。我们评估了在Windows便携式可执行文件(PES)上培训的两个黑盒分类器的方法的有效性。针对对PE的原始字节序列训练的卷积神经网络(CNN),我们的方法可实现92%的准确替代品(以PES的像素表示训练),目标与替代模型之间的预测一致性近90%。在97.8%的精确梯度增强的决策树上,对静态PE特征进行了训练,我们的91%准确替代品与90%的预测的Black-Box一致,这表明我们纯粹的黑盒近似值的强度。
An adversary who aims to steal a black-box model repeatedly queries the model via a prediction API to learn a function that approximates its decision boundary. Adversarial approximation is non-trivial because of the enormous combinations of model architectures, parameters, and features to explore. In this context, the adversary resorts to a best-effort strategy that yields the closest approximation. This paper explores best-effort adversarial approximation of a black-box malware classifier in the most challenging setting, where the adversary's knowledge is limited to a prediction label for a given input. Beginning with a limited input set for the black-box classifier, we leverage feature representation mapping and cross-domain transferability to approximate a black-box malware classifier by locally training a substitute. Our approach approximates the target model with different feature types for the target and the substitute model while also using non-overlapping data for training the target, training the substitute, and the comparison of the two. We evaluate the effectiveness of our approach against two black-box classifiers trained on Windows Portable Executables (PEs). Against a Convolutional Neural Network (CNN) trained on raw byte sequences of PEs, our approach achieves a 92% accurate substitute (trained on pixel representations of PEs), and nearly 90% prediction agreement between the target and the substitute model. Against a 97.8% accurate gradient boosted decision tree trained on static PE features, our 91% accurate substitute agrees with the black-box on 90% of predictions, suggesting the strength of our purely black-box approximation.