DOC2BOT：通过对话机器人访问异质文档

论文标题

DOC2BOT：通过对话机器人访问异质文档

Doc2Bot: Accessing Heterogeneous Documents via Conversational Bots

论文作者

Fu, Haomin, Zhang, Yeqin, Yu, Haiyang, Sun, Jian, Huang, Fei, Si, Luo, Li, Yongbin, Nguyen, Cam-Tu

论文摘要

本文介绍了Doc2bot，这是一种用于构建机器的新型数据集，可帮助用户通过对话寻求信息。对于拥有大量手册或说明书的公司和组织而言，这是特别感兴趣的。尽管具有潜力，但我们任务的性质还是构成了一些挑战：（1）文档包含各种结构，这些结构阻碍了机器理解的能力，并且（2）用户信息需求通常被指定。与以前关注单一结构类型或忽略质疑对发现用户需求的作用的数据集相比，开发了doc2bot数据集以系统地针对此类挑战。我们的数据集基于来自五个域中的中国文档包含超过100,000个转弯，比任何先前的文档接地对话框数据集都大。我们在DOC2BOT中提出了三个任务：（1）对话框跟踪以跟踪用户意图，（2）对话策略学习以计划系统操作和内容，以及（3）响应生成，该响应生成基于对话框策略的输出而生成响应。提出了基于最新深度学习模型的基线方法，表明我们提出的任务具有挑战性，值得进一步研究。

This paper introduces Doc2Bot, a novel dataset for building machines that help users seek information via conversations. This is of particular interest for companies and organizations that own a large number of manuals or instruction books. Despite its potential, the nature of our task poses several challenges: (1) documents contain various structures that hinder the ability of machines to comprehend, and (2) user information needs are often underspecified. Compared to prior datasets that either focus on a single structural type or overlook the role of questioning to uncover user needs, the Doc2Bot dataset is developed to target such challenges systematically. Our dataset contains over 100,000 turns based on Chinese documents from five domains, larger than any prior document-grounded dialog dataset for information seeking. We propose three tasks in Doc2Bot: (1) dialog state tracking to track user intentions, (2) dialog policy learning to plan system actions and contents, and (3) response generation which generates responses based on the outputs of the dialog policy. Baseline methods based on the latest deep learning models are presented, indicating that our proposed tasks are challenging and worthy of further research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题