在面向任务的对话框中对语言理解的鲁棒性测试

论文标题

在面向任务的对话框中对语言理解的鲁棒性测试

Robustness Testing of Language Understanding in Task-Oriented Dialog

论文作者

Liu, Jiexi, Takanobu, Ryuichi, Wen, Jiaxin, Wan, Dazhen, Li, Hongguang, Nie, Weiran, Li, Cheng, Peng, Wei, Huang, Minlie

论文摘要

大多数语言理解模型在以任务为导向的对话系统中都经过少量注释的培训数据进行培训，并以相同分布的一小部分进行评估。但是，这些模型可能会导致系统故障或在实践中暴露于自然语言扰动或变化时，可能会导致系统故障。在本文中，我们就自然语言理解模型的鲁棒性进行了全面的评估和分析，并在现实世界对话框系统中引入了与语言理解有关的三个重要方面，即语言多样性，语音特征和噪声扰动。我们提出了一个模型无关工具包laug，以近似自然语言扰动，以测试以任务为导向的对话框中的鲁棒性问题。涵盖了这三个方面的四种数据增强方法是在笑中组装的，这揭示了最先进的模型中的关键鲁棒性问题。通过笑的增强数据集可用于促进对以任务为导向的对话在语言理解的鲁棒性测试的未来研究。

Most language understanding models in task-oriented dialog systems are trained on a small amount of annotated training data, and evaluated in a small set from the same distribution. However, these models can lead to system failure or undesirable output when being exposed to natural language perturbation or variation in practice. In this paper, we conduct comprehensive evaluation and analysis with respect to the robustness of natural language understanding models, and introduce three important aspects related to language understanding in real-world dialog systems, namely, language variety, speech characteristics, and noise perturbation. We propose a model-agnostic toolkit LAUG to approximate natural language perturbations for testing the robustness issues in task-oriented dialog. Four data augmentation approaches covering the three aspects are assembled in LAUG, which reveals critical robustness issues in state-of-the-art models. The augmented dataset through LAUG can be used to facilitate future research on the robustness testing of language understanding in task-oriented dialog.

下载PDF全文

下载文献需遵守相关版权规定

论文标题