Annobert：有效地表示多个注释者的标签选择以改善仇恨言论检测

论文标题

Annobert：有效地表示多个注释者的标签选择以改善仇恨言论检测

AnnoBERT: Effectively Representing Multiple Annotators' Label Choices to Improve Hate Speech Detection

论文作者

Yin, Wenjie, Agarwal, Vibhor, Jiang, Aiqi, Zubiaga, Arkaitz, Sastry, Nishanth

论文摘要

监督方法通常依赖于多数标签。但是，在主观任务（例如仇恨言论检测）中，很难在注释者之间达成很高的一致性。现有的神经网络模型主要将标签视为分类变量，同时忽略了不同标签文本中的语义信息。在本文中，我们提出了Annobert，Annobert是一种首个架构，将注释器特征和标签文本与基于变压器的模型集成在一起，以检测仇恨言论，其独特表示基于每个注释者的特征，通过协作主题回归（CTR）并集成标签文本以丰富文本表示。在培训期间，该模型将注释者与标签选择相关联，给出了一段文字。在评估过程中，当无法获得标签信息时，该模型通过利用学习的关联来预测参与注释者给出的汇总标签。提出的方法在检测仇恨言论方面表现出了优势，尤其是在少数群体和带注释者分歧的边缘案例中。当数据集更加平衡时，整体绩效的改善是最大的，这表明其在识别现实世界中的仇恨言论方面的实际价值，因为与正常（非讨厌）言语相比，社交媒体上的仇恨言论的数量极小。通过消融研究，我们显示了注释器嵌入和标签文本对模型性能的相对贡献，并测试了一系列替代注释器嵌入和标签文本组合。

Supervised approaches generally rely on majority-based labels. However, it is hard to achieve high agreement among annotators in subjective tasks such as hate speech detection. Existing neural network models principally regard labels as categorical variables, while ignoring the semantic information in diverse label texts. In this paper, we propose AnnoBERT, a first-of-its-kind architecture integrating annotator characteristics and label text with a transformer-based model to detect hate speech, with unique representations based on each annotator's characteristics via Collaborative Topic Regression (CTR) and integrate label text to enrich textual representations. During training, the model associates annotators with their label choices given a piece of text; during evaluation, when label information is not available, the model predicts the aggregated label given by the participating annotators by utilising the learnt association. The proposed approach displayed an advantage in detecting hate speech, especially in the minority class and edge cases with annotator disagreement. Improvement in the overall performance is the largest when the dataset is more label-imbalanced, suggesting its practical value in identifying real-world hate speech, as the volume of hate speech in-the-wild is extremely small on social media, when compared with normal (non-hate) speech. Through ablation studies, we show the relative contributions of annotator embeddings and label text to the model performance, and tested a range of alternative annotator embeddings and label text combinations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题