论文标题

单词感官和定义的单语词法资源对齐

Monolingual alignment of word senses and definitions in lexicographical resources

论文作者

Ahmadi, Sina

论文摘要

本论文的重点广泛地放在词典数据的一致性上,尤其是字典。为了应对该领域的某些挑战,解决了两个主要的词性一致性和翻译推理的主要任务。鉴于在两个不同的单语词典中,头词的感觉定义,第一个任务旨在找到最佳的对齐。这是一项具有挑战性的任务,尤其是由于意义粒度,覆盖范围和两个资源的描述差异。在描述了各种词汇语义资源的特征之后,我们介绍了一个基准,其中包含17种语言的数据集,其中单词感官感官和定义由专家在不同的资源中手动注释。在创建基准测试的过程中,词典学家的知识是通过注释来纳入的,在该注释中,每个感觉对选择语义关系,即精确,狭窄,更广泛,相关或无,而无需选择。该基准可用于评估单词态对准系统的目的。使用基准评估了基于文本和非文本语义相似性检测和语义关系诱导的几种比对技术的性能。最后,我们将这项工作扩展到翻译推断,其中诱导翻译对以一种基于图形分析的各种方法以无监督的方式生成双语词典。对于资源较低和代表性不足的语言创造词典资源,这项任务特别令人感兴趣,而且还有助于增加对现有资源的覆盖范围。从实际的角度来看,本文中开发的技术和方法是在可以促进对齐任务的工具中实现的。

The focus of this thesis is broadly on the alignment of lexicographical data, particularly dictionaries. In order to tackle some of the challenges in this field, two main tasks of word sense alignment and translation inference are addressed. The first task aims to find an optimal alignment given the sense definitions of a headword in two different monolingual dictionaries. This is a challenging task, especially due to differences in sense granularity, coverage and description in two resources. After describing the characteristics of various lexical semantic resources, we introduce a benchmark containing 17 datasets of 15 languages where monolingual word senses and definitions are manually annotated across different resources by experts. In the creation of the benchmark, lexicographers' knowledge is incorporated through the annotations where a semantic relation, namely exact, narrower, broader, related or none, is selected for each sense pair. This benchmark can be used for evaluation purposes of word-sense alignment systems. The performance of a few alignment techniques based on textual and non-textual semantic similarity detection and semantic relation induction is evaluated using the benchmark. Finally, we extend this work to translation inference where translation pairs are induced to generate bilingual lexicons in an unsupervised way using various approaches based on graph analysis. This task is of particular interest for the creation of lexicographical resources for less-resourced and under-represented languages and also, assists in increasing coverage of the existing resources. From a practical point of view, the techniques and methods that are developed in this thesis are implemented within a tool that can facilitate the alignment task.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源