论文标题

MIQA:推断隐喻性问题的基准

MiQA: A Benchmark for Inference on Metaphorical Questions

论文作者

Comsa, Iulia-Maria, Eisenschlos, Julian Martin, Narayanan, Srini

论文摘要

我们提出了一个基准,以评估大语言模型使用常规隐喻推理的能力。我们的基准将隐喻检测和常识性推理的主题结合到一个单个任务中,该任务需要通过准确选择文字和隐喻寄存器来进行推断。我们研究了最先进的预训练模型在二进制选择任务上的性能,并在从偶然到近人类级别的小型和大型模型的性能之间找到巨大的差异。我们还分析了生成环境中最大的模型,并发现尽管接近人类绩效,但仍需要仔细的多弹性提示。

We propose a benchmark to assess the capability of large language models to reason with conventional metaphors. Our benchmark combines the previously isolated topics of metaphor detection and commonsense reasoning into a single task that requires a model to make inferences by accurately selecting between the literal and metaphorical register. We examine the performance of state-of-the-art pre-trained models on binary-choice tasks and find a large discrepancy between the performance of small and very large models, going from chance to near-human level. We also analyse the largest model in a generative setting and find that although human performance is approached, careful multiple-shot prompting is required.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源