作为元学习的预培训文本表示

论文标题

作为元学习的预培训文本表示

Pre-training Text Representations as Meta Learning

论文作者

Lv, Shangwen, Wang, Yuechen, Guo, Daya, Tang, Duyu, Duan, Nan, Zhu, Fuqing, Gong, Ming, Shou, Linjun, Ma, Ryan, Jiang, Daxin, Cao, Guihong, Zhou, Ming, Hu, Songlin

论文摘要

最近已显示培训前文本表示形式可显着改善许多自然语言处理任务中的最先进。预训练的中心目标是学习对后续任务有用的文本表示形式。但是，通过最大程度地限制代理目标（例如语言建模的负模可能性）来优化现有方法。在这项工作中，我们介绍了一种学习算法，该算法直接优化了模型学习文本表示以有效学习下游任务的能力。我们表明，通过一系列元训练步骤，多任务预训练和模型 - 不合稳定元学习之间存在固有的联系。 BERT中采用的标准多任务学习目标是我们学习算法的特殊情况，其中元训练的深度为零。我们在两个环境中研究了问题：无监督的预训练和监督的预训练，并使用不同的预训练对象来验证我们的方法的通用性。实验结果表明，我们的算法带来了改进，并学习了更好的初始化，以实现各种下游任务。

Pre-training text representations has recently been shown to significantly improve the state-of-the-art in many natural language processing tasks. The central goal of pre-training is to learn text representations that are useful for subsequent tasks. However, existing approaches are optimized by minimizing a proxy objective, such as the negative log likelihood of language modeling. In this work, we introduce a learning algorithm which directly optimizes model's ability to learn text representations for effective learning of downstream tasks. We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps. The standard multi-task learning objective adopted in BERT is a special case of our learning algorithm where the depth of meta-train is zero. We study the problem in two settings: unsupervised pre-training and supervised pre-training with different pre-training objects to verify the generality of our approach.Experimental results show that our algorithm brings improvements and learns better initializations for a variety of downstream tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题