Twibot-22：迈向基于图的Twitter机器人检测

论文标题

Twibot-22：迈向基于图的Twitter机器人检测

TwiBot-22: Towards Graph-Based Twitter Bot Detection

论文作者

Feng, Shangbin, Tan, Zhaoxuan, Wan, Herun, Wang, Ningnan, Chen, Zilong, Zhang, Binchi, Zheng, Qinghua, Zhang, Wenqian, Lei, Zhenyu, Yang, Shujie, Feng, Xinshun, Zhang, Qingyue, Wang, Hongrui, Liu, Yuhan, Bai, Yuyang, Wang, Heng, Cai, Zijian, Wang, Yanbo, Zheng, Lijing, Ma, Zihan, Li, Jundong, Luo, Minnan

论文摘要

Twitter机器人检测已成为打击错误信息，促进社交媒体审核并保持在线话语的完整性的越来越重要的任务。最先进的机器人检测方法通常利用Twitter网络的图形结构，并且在面对传统方法无法检测到的新型Twitter机器人时表现出令人鼓舞的性能。但是，现有的Twitter机器人检测数据集很少是基于图形的，即使这些基于图形的数据集也遭受有限的数据集量表，不完整的图形结构以及低注释质量。实际上，缺乏解决这些问题的大规模基于图的Twitter机器人检测基准，严重阻碍了新型基于图的机器人检测方法的开发和评估。在本文中，我们提出了Twibot-22，这是一种综合基于图的Twitter机器人检测基准，它列出了迄今为止最大的数据集，在Twitter网络上提供了多元化的实体和关系，并且与现有数据集相比，注释质量更好。此外，我们重新实现35代表性的Twitter机器人检测基线，并在包括Twibot-22在内的9个数据集上进行评估，以促进对模型性能和对研究进度的整体了解的公平比较。为了促进进一步的研究，我们将所有实施的代码和数据集巩固到Twibot-22评估框架中，研究人员可以在其中始终如一地评估新的模型和数据集。 Twibot-22 Twitter机器人检测基准和评估框架可在https://twibot22.github.io/上公开获得。

Twitter bot detection has become an increasingly important task to combat misinformation, facilitate social media moderation, and preserve the integrity of the online discourse. State-of-the-art bot detection methods generally leverage the graph structure of the Twitter network, and they exhibit promising performance when confronting novel Twitter bots that traditional methods fail to detect. However, very few of the existing Twitter bot detection datasets are graph-based, and even these few graph-based datasets suffer from limited dataset scale, incomplete graph structure, as well as low annotation quality. In fact, the lack of a large-scale graph-based Twitter bot detection benchmark that addresses these issues has seriously hindered the development and evaluation of novel graph-based bot detection approaches. In this paper, we propose TwiBot-22, a comprehensive graph-based Twitter bot detection benchmark that presents the largest dataset to date, provides diversified entities and relations on the Twitter network, and has considerably better annotation quality than existing datasets. In addition, we re-implement 35 representative Twitter bot detection baselines and evaluate them on 9 datasets, including TwiBot-22, to promote a fair comparison of model performance and a holistic understanding of research progress. To facilitate further research, we consolidate all implemented codes and datasets into the TwiBot-22 evaluation framework, where researchers could consistently evaluate new models and datasets. The TwiBot-22 Twitter bot detection benchmark and evaluation framework are publicly available at https://twibot22.github.io/

下载PDF全文

下载文献需遵守相关版权规定

论文标题