论文标题
大型神经语言模型的对抗培训
Adversarial Training for Large Neural Language Models
论文作者
论文摘要
概括和鲁棒性都是设计机器学习方法的关键逃亡者。对抗性训练可以增强鲁棒性,但是过去的工作经常发现它损害了概括。在自然语言处理(NLP)中,诸如BERT之类的预训练的大型神经语言模型在各种任务中表现出了令人印象深刻的概括,并且对对抗性微调进一步改善。但是,这些模型仍然容易受到对抗性攻击的影响。在本文中,我们表明对抗性预训练可以提高概括和鲁棒性。我们提出了一种通用算法校友(大型神经语言模型的对抗训练),该明矾通过在嵌入空间中应用扰动来规范训练目标,从而最大程度地提高了对抗性损失。我们介绍了各个阶段对对抗性训练的首次全面研究,包括从头开始的预训练,在训练有素的模型上持续进行预培训以及特定于任务的微调。在常规和对抗场景中,明矾在广泛的NLP任务上都获得了BERT的可观收益。即使对于接受过非常大的文本语料库(例如罗伯塔)培训的模型,明矾仍然可以从持续的预训练中产生巨大的收益,而常规的非对抗性方法则不能。明矾可以与特定于任务的微调相结合,以获得更多的收益。明矾代码可在https://github.com/namisan/mt-dnn上公开获得。
Generalization and robustness are both key desiderata for designing machine learning methods. Adversarial training can enhance robustness, but past work often finds it hurts generalization. In natural language processing (NLP), pre-training large neural language models such as BERT have demonstrated impressive gain in generalization for a variety of tasks, with further improvement from adversarial fine-tuning. However, these models are still vulnerable to adversarial attacks. In this paper, we show that adversarial pre-training can improve both generalization and robustness. We propose a general algorithm ALUM (Adversarial training for large neural LangUage Models), which regularizes the training objective by applying perturbations in the embedding space that maximizes the adversarial loss. We present the first comprehensive study of adversarial training in all stages, including pre-training from scratch, continual pre-training on a well-trained model, and task-specific fine-tuning. ALUM obtains substantial gains over BERT on a wide range of NLP tasks, in both regular and adversarial scenarios. Even for models that have been well trained on extremely large text corpora, such as RoBERTa, ALUM can still produce significant gains from continual pre-training, whereas conventional non-adversarial methods can not. ALUM can be further combined with task-specific fine-tuning to attain additional gains. The ALUM code is publicly available at https://github.com/namisan/mt-dnn.