论文标题

罗宾:新颖的在线自杀文本语料库

Robin: A Novel Online Suicidal Text Corpus of Substantial Breadth and Scale

论文作者

DiPietro, Daniel, Hazari, Vivek, Vosoughi, Soroush

论文摘要

自杀是主要的公共卫生危机。每年有超过20,000,000次自杀企图,对自杀意图的早期发现有可能挽救数十万生命。传统的心理健康筛查方法是耗时的,昂贵的,并且在弱势群体中通常无法访问;使用机器学习对自杀意图的在线检测提供了可行的替代方法。在这里,我们介绍了迄今为止最大的非关键字生成的自杀语料库,其中包括110万个在线论坛发布。除了其前所未有的规模外,罗宾还专门构建了各种自杀文本,例如自杀丧亲和轻率的参考文献,更好地启用了对罗宾(Robin)训练的模型,以学习表达自称构思的文本的微妙细微差别。实验结果通过传统方法(例如逻辑回归(F1 = 0.85))以及大规模的预训练的语言模型(例如BERT(F1 = 0.92)),实现了自杀文本的最新性能。最后,我们公开发布罗宾数据集作为机器学习资源,有可能推动下一代自杀情绪研究。

Suicide is a major public health crisis. With more than 20,000,000 suicide attempts each year, the early detection of suicidal intent has the potential to save hundreds of thousands of lives. Traditional mental health screening methods are time-consuming, costly, and often inaccessible to disadvantaged populations; online detection of suicidal intent using machine learning offers a viable alternative. Here we present Robin, the largest non-keyword generated suicidal corpus to date, consisting of over 1.1 million online forum postings. In addition to its unprecedented size, Robin is specially constructed to include various categories of suicidal text, such as suicide bereavement and flippant references, better enabling models trained on Robin to learn the subtle nuances of text expressing suicidal ideation. Experimental results achieve state-of-the-art performance for the classification of suicidal text, both with traditional methods like logistic regression (F1=0.85), as well as with large-scale pre-trained language models like BERT (F1=0.92). Finally, we release the Robin dataset publicly as a machine learning resource with the potential to drive the next generation of suicidal sentiment research.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源