健康：在线文本健康建议中对实体进行分类

论文标题

健康：在线文本健康建议中对实体进行分类

HealthE: Classifying Entities in Online Textual Health Advice

论文作者

Gatto, Joseph, Seegmiller, Parker, Johnston, Garrett, Preum, Sarah M.

论文摘要

自然语言实体的处理对于许多医学NLP系统至关重要。不幸的是，现有数据集大大不足为对公共卫生相关文本进行建模所需的实体，例如在WebMD等网站上经常发现的健康建议。人们依靠此类信息进行个人健康管理和临床相关的决策。在这项工作中，我们发布了一个新的注释数据集Healthe，由6,756个健康建议组成。与现有的医学语料库相比，Healthe具有更精细的标签空间，并包含不同健康短语的注释。此外，我们介绍了一个新的健康实体分类模型EP S-Bert，该模型在实体类别的分类中利用文本上下文模式。与现成的医疗NER工具相比，EP S-Bert的F1得分比最接近的基线提高了4分，F1的得分增加了34分，而接受培训的医疗工具和临床文本中提到了药物。所有代码和数据均在GitHub上公开可用。

The processing of entities in natural language is essential to many medical NLP systems. Unfortunately, existing datasets vastly under-represent the entities required to model public health relevant texts such as health advice often found on sites like WebMD. People rely on such information for personal health management and clinically relevant decision making. In this work, we release a new annotated dataset, HealthE, consisting of 6,756 health advice. HealthE has a more granular label space compared to existing medical NER corpora and contains annotation for diverse health phrases. Additionally, we introduce a new health entity classification model, EP S-BERT, which leverages textual context patterns in the classification of entity classes. EP S-BERT provides a 4-point increase in F1 score over the nearest baseline and a 34-point increase in F1 when compared to off-the-shelf medical NER tools trained to extract disease and medication mentions from clinical texts. All code and data are publicly available on Github.

下载PDF全文

下载文献需遵守相关版权规定

论文标题