论文标题
聊天日志的无监督摘要,其面向主题的排名和上下文感知自动编码器
Unsupervised Summarization for Chat Logs with Topic-Oriented Ranking and Context-Aware Auto-Encoders
论文作者
论文摘要
自动聊天摘要可以帮助人们从众多聊天消息中迅速掌握重要信息。与传统文档不同,聊天日志通常具有分散和不断发展的主题。此外,这些日志包含大量的椭圆形和疑问句,这使聊天摘要高度取决于上下文。在这项工作中,我们提出了一个名为rankae的新颖的无监督框架,以执行聊天摘要,而无需使用手动标记的数据。 rankae由一种面向主题的排名策略组成,该策略同时根据中心性和多样性选择主题话语,以及一个精心设计的Denoing自动编码器,以基于所选的话语来生成简洁但上下文信息的摘要。为了评估所提出的方法,我们从客户服务环境中收集一个大规模的聊天日志数据集,并仅用于模型评估的注释集。实验结果表明,Rankae明显优于其他无监督的方法,并能够在相关性和主题覆盖范围内产生高质量的摘要。
Automatic chat summarization can help people quickly grasp important information from numerous chat messages. Unlike conventional documents, chat logs usually have fragmented and evolving topics. In addition, these logs contain a quantity of elliptical and interrogative sentences, which make the chat summarization highly context dependent. In this work, we propose a novel unsupervised framework called RankAE to perform chat summarization without employing manually labeled data. RankAE consists of a topic-oriented ranking strategy that selects topic utterances according to centrality and diversity simultaneously, as well as a denoising auto-encoder that is carefully designed to generate succinct but context-informative summaries based on the selected utterances. To evaluate the proposed method, we collect a large-scale dataset of chat logs from a customer service environment and build an annotated set only for model evaluation. Experimental results show that RankAE significantly outperforms other unsupervised methods and is able to generate high-quality summaries in terms of relevance and topic coverage.