来自强大的预培训的数据有效的双赢彩票门票

论文标题

来自强大的预培训的数据有效的双赢彩票门票

Data-Efficient Double-Win Lottery Tickets from Robust Pre-training

论文作者

Chen, Tianlong, Zhang, Zhenyu, Liu, Sijia, Zhang, Yang, Chang, Shiyu, Wang, Zhangyang

论文摘要

预训练是在各种下游任务上转移学习的广泛采用的起点。对彩票假设（LTH）的最新研究表明，这种巨大的预训练模型可以用极稀疏的子网（又称匹配子网匹配）而无需牺牲可传递性而取代。但是，实际的安全 - 重要应用程序通常在标准转移之外提出了更具挑战性的要求，这也要求这些子网克服对抗性脆弱性。在本文中，我们制定了一个更严格的概念，双赢的彩票，其中预先培训模型的位置可以独立地在多样化的下游任务上转移，以在标准和对抗性训练方案下达到相同的标准和强大的概括，因为整个预训练模型都可以做到。我们全面研究了各种训练机制，发现强大的预训练倾向于制作更稀疏的双赢彩票，其性能优于标准的彩票。例如，在下游CIFAR-10/100数据集上，我们识别出具有标准，快速对手和对抗性预训练的双赢匹配子网，分别为89.26％/73.79％，89.26％/79.03％，以及91.41％/83.22％的sparsity。此外，我们观察到所获得的双赢彩票票可以在实际数据限制（例如1％和10％）下游方案下传输的数据效率更高。我们的结果表明，彩票票务方案以及数据限制的转移设置可以扩大稳健的预训练的好处。代码可在https://github.com/vita-group/double-win-lth上找到。

Pre-training serves as a broadly adopted starting point for transfer learning on various downstream tasks. Recent investigations of lottery tickets hypothesis (LTH) demonstrate such enormous pre-trained models can be replaced by extremely sparse subnetworks (a.k.a. matching subnetworks) without sacrificing transferability. However, practical security-crucial applications usually pose more challenging requirements beyond standard transfer, which also demand these subnetworks to overcome adversarial vulnerability. In this paper, we formulate a more rigorous concept, Double-Win Lottery Tickets, in which a located subnetwork from a pre-trained model can be independently transferred on diverse downstream tasks, to reach BOTH the same standard and robust generalization, under BOTH standard and adversarial training regimes, as the full pre-trained model can do. We comprehensively examine various pre-training mechanisms and find that robust pre-training tends to craft sparser double-win lottery tickets with superior performance over the standard counterparts. For example, on downstream CIFAR-10/100 datasets, we identify double-win matching subnetworks with the standard, fast adversarial, and adversarial pre-training from ImageNet, at 89.26%/73.79%, 89.26%/79.03%, and 91.41%/83.22% sparsity, respectively. Furthermore, we observe the obtained double-win lottery tickets can be more data-efficient to transfer, under practical data-limited (e.g., 1% and 10%) downstream schemes. Our results show that the benefits from robust pre-training are amplified by the lottery ticket scheme, as well as the data-limited transfer setting. Codes are available at https://github.com/VITA-Group/Double-Win-LTH.

下载PDF全文

下载文献需遵守相关版权规定

论文标题