对人群的序列标记进行建模顺序注释

论文标题

对人群的序列标记进行建模顺序注释

Modeling sequential annotations for sequence labeling with crowds

论文作者

Lu, Xiaolei, Chow, Tommy W. S.

论文摘要

人群顺序注释可能是一种有效且具有成本效益的方式，用于构建用于序列标签的大型数据集。与对独立实例进行标记不同，对于人群顺序注释，标签序列的质量取决于注释者在捕获序列中每个令牌的内部依赖性方面的专业知识水平。在本文中，我们提出了与人群（SA-SLC）序列标记的序列标记的建模顺序注释。首先，开发了有条件的概率模型，以共同模拟顺序数据和注释者的专业知识，其中引入分类分布以估算每个注释者在捕获局部和非本地标记依赖性以进行顺序注释时的可靠性。为了加速所提出的模型的边缘化，提出了有效的标签序列推理（VLSE）方法，以从人群顺序注释中得出有效的地面真实标签序列。 VLSE从令牌级别中得出了可能的地面真相标签，并在标签序列解码的正向推理中进一步介绍了李子标签。 VLSE减少了候选标签序列的数量，并提高了可能的地面真实标签序列的质量。自然语言处理的几个序列标记任务的实验结果表明了所提出的模型的有效性。

Crowd sequential annotations can be an efficient and cost-effective way to build large datasets for sequence labeling. Different from tagging independent instances, for crowd sequential annotations the quality of label sequence relies on the expertise level of annotators in capturing internal dependencies for each token in the sequence. In this paper, we propose Modeling sequential annotation for sequence labeling with crowds (SA-SLC). First, a conditional probabilistic model is developed to jointly model sequential data and annotators' expertise, in which categorical distribution is introduced to estimate the reliability of each annotator in capturing local and non-local label dependency for sequential annotation. To accelerate the marginalization of the proposed model, a valid label sequence inference (VLSE) method is proposed to derive the valid ground-truth label sequences from crowd sequential annotations. VLSE derives possible ground-truth labels from the token-wise level and further prunes sub-paths in the forward inference for label sequence decoding. VLSE reduces the number of candidate label sequences and improves the quality of possible ground-truth label sequences. The experimental results on several sequence labeling tasks of Natural Language Processing show the effectiveness of the proposed model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题