元学习快速体重语言模型

论文标题

元学习快速体重语言模型

Meta-Learning Fast Weight Language Models

论文作者

Clark, Kevin, Guu, Kelvin, Chang, Ming-Wei, Pasupat, Panupong, Hinton, Geoffrey, Norouzi, Mohammad

论文摘要

语言模型（LMS）的动态评估在测试时间适应了模型参数，该参数使用以前的令牌中的梯度信息并大大改善了LM性能。但是，它需要比标准推断要高3倍以上。我们提出快速重量层（FWLS），这是一种神经成分，通过将梯度更新表示为线性注意力，从而更有效地提供了动态评估的好处。对动态评估的关键改进是在训练时也可以应用FWL，因此该模型学会了充分利用梯度更新。 FWL可以轻松地添加到现有的变压器模型之上，需要相对较少的额外计算或内存才能运行，并显着改善了语言建模的困惑。

Dynamic evaluation of language models (LMs) adapts model parameters at test time using gradient information from previous tokens and substantially improves LM performance. However, it requires over 3x more compute than standard inference. We present Fast Weight Layers (FWLs), a neural component that provides the benefits of dynamic evaluation much more efficiently by expressing gradient updates as linear attention. A key improvement over dynamic evaluation is that FWLs can also be applied at training time so the model learns to make good use of gradient updates. FWLs can easily be added on top of existing transformer models, require relatively little extra compute or memory to run, and significantly improve language modeling perplexity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题