多层结构：通过教学调整改善多模式零射击学习

论文标题

多层结构：通过教学调整改善多模式零射击学习

MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning

论文作者

Xu, Zhiyang, Shen, Ying, Huang, Lifu

论文摘要

指令调整是通过说明指定的任务进行微调预训练的语言模型的新学习范式，它显示了各种自然语言处理任务上有希望的零弹性表现。但是，对于视觉和多模式任务，它尚待探索。在这项工作中，我们介绍了MUL-TIINSTRUCTION，这是第一个多模式指令调整基准数据集，该数据集由62个不同的多模式任务组成，涵盖了10个广泛类别。这些任务始于21个现有的开源数据集，每个任务都配备了5个专家写的说明。我们将OFA作为多模式指令调整的基本预训练模型，并为进一步提高其零射击性能，探索多个转移学习策略，以利用大规模的自然说明数据集。实验结果表明，各种看不见的多模式任务上的零射击性能以及从仅文本指令数据集进行转移学习的好处。我们还设计了一种新的评估指标 - 灵敏度，以评估模型对多种说明的敏感程度。我们的结果表明，对各种任务和指令进行微调模型会导致对每个任务的指令变化的敏感性降低。

Instruction tuning, a new learning paradigm that fine-tunes pre-trained language models on tasks specified through instructions, has shown promising zero-shot performance on various natural language processing tasks. However, it has yet to be explored for vision and multimodal tasks. In this work, we introduce MUL-TIINSTRUCT, the first multimodal instruction tuning benchmark dataset that consists of 62 diverse multimodal tasks in a unified seq-to-seq format covering 10 broad categories. The tasks are derived from 21 existing open-source datasets and each task is equipped with 5 expert-written instructions. We take OFA as the base pre-trained model for multimodal instruction tuning, and to further improve its zero-shot performance, we explore multiple transfer learning strategies to leverage the large-scale NATURAL INSTRUCTIONS dataset. Experimental results demonstrate strong zero-shot performance on various unseen multimodal tasks and the benefit of transfer learning from a text-only instruction dataset. We also design a new evaluation metric - Sensitivity, to evaluate how sensitive the model is to the variety of instructions. Our results indicate that fine-tuning the model on a diverse set of tasks and instructions leads to a reduced sensitivity to variations in instructions for each task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题