论文标题
归因于问题回答:归因于大语言模型的评估和建模
Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models
论文作者
论文摘要
大型语言模型(LLM)表现出令人印象深刻的结果,而几乎不需要直接监督。此外,有越来越多的证据表明,LLM在寻求信息的方案中可能具有潜力。我们认为,LLM归因其生成的文本的能力在这种情况下可能至关重要。我们制定和研究将QA归因于归因LLM的开发的关键第一步。我们为任务提出了可再现的评估框架,并为广泛的体系结构进行了基准测试。我们将人类注释作为黄金标准,并表明相关的自动指标适合开发。我们的实验工作给出了两个关键问题(如何衡量归因?,以及当前最新方法在归因方面的表现如何?),并给出一些有关如何解决第三个问题(如何使用归因构建LLMS?)。
Large language models (LLMs) have shown impressive results while requiring little or no direct supervision. Further, there is mounting evidence that LLMs may have potential in information-seeking scenarios. We believe the ability of an LLM to attribute the text that it generates is likely to be crucial in this setting. We formulate and study Attributed QA as a key first step in the development of attributed LLMs. We propose a reproducible evaluation framework for the task and benchmark a broad set of architectures. We take human annotations as a gold standard and show that a correlated automatic metric is suitable for development. Our experimental work gives concrete answers to two key questions (How to measure attribution?, and How well do current state-of-the-art methods perform on attribution?), and give some hints as to how to address a third (How to build LLMs with attribution?).