论文标题

双向语言模型也很少

Bidirectional Language Models Are Also Few-shot Learners

论文作者

Patel, Ajay, Li, Bryan, Rasooli, Mohammad Sadegh, Constant, Noah, Raffel, Colin, Callison-Burch, Chris

论文摘要

GPT-3(Brown等,2020)等大型语言模型可以执行任意任务,而无需进行一些标记的示例,而无需进行微调。可以将任意任务重新构成自然语言提示,并且可以要求语言模型生成完成,并以称为基于及时的学习的范式间接执行任务。迄今为止,主要针对单向语言模型证明了紧急的基于迅速的学习能力。但是,以否定目标(例如蒙版语言建模)进行了预先培训的双向语言模型为转移学习提供了更强大的学习表现。这激发了促使双向模型的可能性,但是它们的预训练目标使它们与现有的提示范式不相容。我们提出SAP(顺序自动回旋提示),该技术可以使双向模型提示。将机器翻译任务作为案例研究,我们促使双向MT5模型(Xue等,2021)带有SAP,并证明其少量和零射击的翻译优于GPT-3和XGLM等单向模型(Lin等人,2021年)的单向模型的几个射击转换,尽管MT5和XGLM(Lin。我们进一步表明SAP对问题的回答和摘要有效。我们的结果首次表明基于及时的学习是更广泛的语言模型的新兴属性,而不仅仅是单向模型。

Large language models such as GPT-3 (Brown et al., 2020) can perform arbitrary tasks without undergoing fine-tuning after being prompted with only a few labeled examples. An arbitrary task can be reformulated as a natural language prompt, and a language model can be asked to generate the completion, indirectly performing the task in a paradigm known as prompt-based learning. To date, emergent prompt-based learning capabilities have mainly been demonstrated for unidirectional language models. However, bidirectional language models pre-trained on denoising objectives such as masked language modeling produce stronger learned representations for transfer learning. This motivates the possibility of prompting bidirectional models, but their pre-training objectives have made them largely incompatible with the existing prompting paradigm. We present SAP (Sequential Autoregressive Prompting), a technique that enables the prompting of bidirectional models. Utilizing the machine translation task as a case study, we prompt the bidirectional mT5 model (Xue et al., 2021) with SAP and demonstrate its few-shot and zero-shot translations outperform the few-shot translations of unidirectional models like GPT-3 and XGLM (Lin et al., 2021), despite mT5's approximately 50% fewer parameters. We further show SAP is effective on question answering and summarization. For the first time, our results demonstrate prompt-based learning is an emergent property of a broader class of language models, rather than only unidirectional models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源