Tweetnerd-端到端实体链接推文基准

论文标题

Tweetnerd-端到端实体链接推文基准

TweetNERD -- End to End Entity Linking Benchmark for Tweets

论文作者

Mishra, Shubhanshu, Saini, Aman, Makki, Raheleh, Mehta, Sneha, Haghighi, Aria, Mollahosseini, Ali

论文摘要

指定的实体识别和歧义（NERD）系统是用于信息检索，问答，事件检测和其他自然语言处理（NLP）应用程序的基础。我们介绍了Tweetnerd，这是一个在2010 - 2021年间的340K+推文的数据集，用于在推文上基准测试书记系统。这是在推文中为书呆子的最大，最具多样的开源数据集基准，可用于促进该领域的研究。我们用Tweetnerd描述了三个书呆任务的评估设置：命名实体识别（NER），实体与真实跨度（EL）链接（el）以及端到端实体链接（end2end）;并提供有关特定Tweetnerd拆分的现有公开方法的性能。 Tweetnerd可在以下网址获得：https：//doi.org/10.5281/zenodo.6617192在Creative Comportibution 4.0 International（CC By 4.0）许可下。在https://github.com/twitter-research/tweetnerd上查看更多详细信息。

Named Entity Recognition and Disambiguation (NERD) systems are foundational for information retrieval, question answering, event detection, and other natural language processing (NLP) applications. We introduce TweetNERD, a dataset of 340K+ Tweets across 2010-2021, for benchmarking NERD systems on Tweets. This is the largest and most temporally diverse open sourced dataset benchmark for NERD on Tweets and can be used to facilitate research in this area. We describe evaluation setup with TweetNERD for three NERD tasks: Named Entity Recognition (NER), Entity Linking with True Spans (EL), and End to End Entity Linking (End2End); and provide performance of existing publicly available methods on specific TweetNERD splits. TweetNERD is available at: https://doi.org/10.5281/zenodo.6617192 under Creative Commons Attribution 4.0 International (CC BY 4.0) license. Check out more details at https://github.com/twitter-research/TweetNERD.

下载PDF全文

下载文献需遵守相关版权规定

论文标题