论文标题
一种简单的学习方法,用于学习无监督的多语言嵌入
A Simple Approach to Learning Unsupervised Multilingual Embeddings
论文作者
论文摘要
在双语环境中无监督学习跨语性嵌入的最新进展已推动学习在没有任何监督的情况下学习几种语言的共享嵌入空间。解决后一种问题的流行框架是共同解决以下两个子问题:1)学习几对语言之间的无监督单词对齐,以及2)学习如何将每种语言的单语嵌入到共享的多语言空间中。相比之下,我们提出了一个简单的两个阶段框架,在该框架中,我们将上述两个子问题分离并使用现有技术分别解决。所提出的方法在各种任务中获得了令人惊讶的良好性能,例如双语词典感应,跨语性单词相似性,多语言文档分类和多语言依赖性解析。当涉及遥远的语言时,提出的解决方案说明了鲁棒性,胜过现有的无监督多语言嵌入方法。总体而言,我们的实验结果鼓励开发用于此类挑战性问题的多阶段模型。
Recent progress on unsupervised learning of cross-lingual embeddings in bilingual setting has given impetus to learning a shared embedding space for several languages without any supervision. A popular framework to solve the latter problem is to jointly solve the following two sub-problems: 1) learning unsupervised word alignment between several pairs of languages, and 2) learning how to map the monolingual embeddings of every language to a shared multilingual space. In contrast, we propose a simple, two-stage framework in which we decouple the above two sub-problems and solve them separately using existing techniques. The proposed approach obtains surprisingly good performance in various tasks such as bilingual lexicon induction, cross-lingual word similarity, multilingual document classification, and multilingual dependency parsing. When distant languages are involved, the proposed solution illustrates robustness and outperforms existing unsupervised multilingual word embedding approaches. Overall, our experimental results encourage development of multi-stage models for such challenging problems.