论文标题
研究命名实体在文本样式传输中的内容保存中的作用
Studying the role of named entities for content preservation in text style transfer
论文作者
论文摘要
文本样式转移技术在自然语言处理中越来越受欢迎,找到了各种应用,例如文本排毒,情感或形式转移。但是,大多数现有方法在公共平台,音乐或娱乐上的在线通信等领域进行了测试,但它们都没有应用于典型的面向任务生产系统的域名,例如个人计划安排(例如,在餐厅预订或在餐厅中餐桌进行预订))。我们通过研究该域中的形式转移来填补这一空白。 我们指出,该域中的文本充满了指定的实体,这对于保持文本的原始意义非常重要。确实,例如,如果有人传达了航班的目的地城市,则不能改变它。因此,我们专注于指定实体在形式文本样式转移方面的内容保存中的作用。 我们收集一个新数据集,以评估文本样式传输中内容相似性度量。它取自以任务为导向的对话的语料库,其中包含许多与现实请求有关的重要实体,这些实体使该数据集在生产中使用之前,对于测试样式传输模型特别有用。此外,我们对预训练的形式传输模型进行了错误分析,并引入了一种简单的技术,以使用有关命名实体的信息来增强文本样式传输中使用的基线内容相似性度量的性能。
Text style transfer techniques are gaining popularity in Natural Language Processing, finding various applications such as text detoxification, sentiment, or formality transfer. However, the majority of the existing approaches were tested on such domains as online communications on public platforms, music, or entertainment yet none of them were applied to the domains which are typical for task-oriented production systems, such as personal plans arrangements (e.g. booking of flights or reserving a table in a restaurant). We fill this gap by studying formality transfer in this domain. We noted that the texts in this domain are full of named entities, which are very important for keeping the original sense of the text. Indeed, if for example, someone communicates the destination city of a flight it must not be altered. Thus, we concentrate on the role of named entities in content preservation for formality text style transfer. We collect a new dataset for the evaluation of content similarity measures in text style transfer. It is taken from a corpus of task-oriented dialogues and contains many important entities related to realistic requests that make this dataset particularly useful for testing style transfer models before using them in production. Besides, we perform an error analysis of a pre-trained formality transfer model and introduce a simple technique to use information about named entities to enhance the performance of baseline content similarity measures used in text style transfer.