盘状：话语标记的自动语义分析

论文标题

盘状：话语标记的自动语义分析

DiscSense: Automated Semantic Analysis of Discourse Markers

论文作者

Sileo, Damien, Van de Cruys, Tim, Pradel, Camille, Muller, Philippe

论文摘要

话语标记（{\ It f. at vat}，{\ it愉快地}等）是单词或短语，用于在子句或句子之间发出语义和/或务实关系。最近的工作已经有效地探索了句子对之间的话语标记的预测，以了解准确的句子表示，这些句子在各种分类任务中都有用。在这项工作中，我们采用了另一个角度：使用经过训练的模型来预测句子对之间的话语标记，我们预测句子对之间具有已知语义关系的合理标记（由现有分类数据集提供）。这些预测使我们能够研究话语标记与分类数据集注释的语义关系之间的联系。在有限的标记和有限的类别中，在标记和话语关系之间提出了手工制作的映射，但是存在数百个表达各种关系的话语标记，并且就竞争性话语理论（在很大程度上是在大陆时代建立的）之间的关系分类尚无共识。通过在现有的语义注释数据集上使用自动递减方法，我们提供了对英语的话语标记的自下而上的表征。所得数据集（名为DiscSense）公开可用。

Discourse markers ({\it by contrast}, {\it happily}, etc.) are words or phrases that are used to signal semantic and/or pragmatic relationships between clauses or sentences. Recent work has fruitfully explored the prediction of discourse markers between sentence pairs in order to learn accurate sentence representations, that are useful in various classification tasks. In this work, we take another perspective: using a model trained to predict discourse markers between sentence pairs, we predict plausible markers between sentence pairs with a known semantic relation (provided by existing classification datasets). These predictions allow us to study the link between discourse markers and the semantic relations annotated in classification datasets. Handcrafted mappings have been proposed between markers and discourse relations on a limited set of markers and a limited set of categories, but there exist hundreds of discourse markers expressing a wide variety of relations, and there is no consensus on the taxonomy of relations between competing discourse theories (which are largely built in a top-down fashion). By using an automatic rediction method over existing semantically annotated datasets, we provide a bottom-up characterization of discourse markers in English. The resulting dataset, named DiscSense, is publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题