早期发现微博中消失的实体

论文标题

早期发现微博中消失的实体

Early Discovery of Disappearing Entities in Microblogs

论文作者

Akasaki, Satoshi, Yoshinaga, Naoki, Toyoda, Masashi

论文摘要

我们通过对现实世界的变化做出反应，特别是事件，餐馆和服务等无常实体的出现和消失做出决定。因为我们想避免错过机会或在他们消失后做出徒劳的行动，所以重要的是要知道何时尽早消失。因此，我们解决了从微博中发现消失的实体的任务，该实体会及时提及各种实体。主要的挑战是检测来自嘈杂的微博帖子中消失实体的不确定背景。为了收集这些消失的上下文，我们设计了时间敏感的遥远监督，该监督利用了知识库和时间序列帖子中的实体，以构建该任务来构建大规模的Twitter数据集\ footNote {我们将在实验中发布数据集（Tweet ID），以促进英语和日本。}。为了确保在嘈杂的环境中进行稳健的检测，我们在目标日的微博流上完善了检测模型的验证单词嵌入。 Twitter数据集上的实验结果证实了收集的标记数据和精制单词嵌入的有效性；比Wikipedia的更新更早发现了Wikipedia中检测到的消失实体的70％以上，平均销售时间超过一个月。

We make decisions by reacting to changes in the real world, in particular, the emergence and disappearance of impermanent entities such as events, restaurants, and services. Because we want to avoid missing out on opportunities or making fruitless actions after they have disappeared, it is important to know when entities disappear as early as possible. We thus tackle the task of detecting disappearing entities from microblogs, whose posts mention various entities, in a timely manner. The major challenge is detecting uncertain contexts of disappearing entities from noisy microblog posts. To collect these disappearing contexts, we design time-sensitive distant supervision, which utilizes entities from the knowledge base and time-series posts, for this task to build large-scale Twitter datasets\footnote{We will release the datasets (tweet IDs) used in the experiments to promote reproducibility.} for English and Japanese. To ensure robust detection in noisy environments, we refine pretrained word embeddings of the detection model on microblog streams of the target day. Experimental results on the Twitter datasets confirmed the effectiveness of the collected labeled data and refined word embeddings; more than 70\% of the detected disappearing entities in Wikipedia are discovered earlier than the update on Wikipedia, and the average lead-time is over one month.

下载PDF全文

下载文献需遵守相关版权规定

论文标题