论文标题
对话中的个人属性预测
Personal Attribute Prediction from Conversations
论文作者
论文摘要
个人知识库(PKB)对于许多应用程序,例如基于Web的聊天机器人和个性化建议至关重要。包含丰富个人知识的对话可以被视为填充PKB的主要来源。给定用户,用户属性和从对话系统中的用户话语,我们旨在预测用户的个人属性值,这有助于PKB的丰富。但是,以前的研究中存在三个问题:(1)模型培训需要手动标记的话语; (2)嵌入在话语和外部资源中的个人属性知识不足; (3)预测一些困难的个人属性的表现并不令人满意。在本文中,我们提出了一个基于预先训练的语言模型的框架DSCGN,具有噪声损失函数,以预测对话中的个人属性,而无需任何标记的话语。我们通过挖掘嵌入在未标记的话语和外部资源中的个人属性知识来微调语言模型的个人属性知识,通过遥远的监督策略和上下文化的单词级监督来产生两类的监督,即通过遥远的监督策略和上下文化的单词级监督。在两个现实世界中的数据集(即,一个职业数据集和爱好数据集)上进行了广泛的实验,显示我们的框架在NDCG和MRR方面获得了最佳性能。
Personal knowledge bases (PKBs) are critical to many applications, such as Web-based chatbots and personalized recommendation. Conversations containing rich personal knowledge can be regarded as a main source to populate the PKB. Given a user, a user attribute, and user utterances from a conversational system, we aim to predict the personal attribute value for the user, which is helpful for the enrichment of PKBs. However, there are three issues existing in previous studies: (1) manually labeled utterances are required for model training; (2) personal attribute knowledge embedded in both utterances and external resources is underutilized; (3) the performance on predicting some difficult personal attributes is unsatisfactory. In this paper, we propose a framework DSCGN based on the pre-trained language model with a noise-robust loss function to predict personal attributes from conversations without requiring any labeled utterances. We yield two categories of supervision, i.e., document-level supervision via a distant supervision strategy and contextualized word-level supervision via a label guessing method, by mining the personal attribute knowledge embedded in both unlabeled utterances and external resources to fine-tune the language model. Extensive experiments over two real-world data sets (i.e., a profession data set and a hobby data set) show our framework obtains the best performance compared with all the twelve baselines in terms of nDCG and MRR.