论文标题

文本颜色对单词嵌入的影响

Effect of Text Color on Word Embeddings

论文作者

Ikoma, Masaya, Iwana, Brian Kenji, Uchida, Seiichi

论文摘要

在自然场景和文档中,我们可以找到文本及其颜色之间的相关性。例如,“热”一词通常以红色印刷,而“冷”通常是蓝色的。可以将此相关性视为表示单词之间语义差异的功能。基于此观察,我们提出了将文本颜色用于单词嵌入的想法。虽然纯文字嵌入(例如Word2Vec)非常成功,但它们通常代表类似的反义词,因为它们通常可以在句子中互换。在本文中,我们尝试两项任务来验证文本颜色在理解单词含义中的实用性,尤其是在识别同义词和反义词方面。首先,我们量化了本书中的单词的颜色分布,并分析单词的颜色和含义之间的相关性。其次,我们尝试以单词的颜色分布作为约束来重新训练单词嵌入。通过观察重新训练之前和之后的同义词和反义词嵌入一词的变化,我们旨在了解在合并文本颜色信息时单词嵌入中具有正面或负面影响的单词。

In natural scenes and documents, we can find the correlation between a text and its color. For instance, the word, "hot", is often printed in red, while "cold" is often in blue. This correlation can be thought of as a feature that represents the semantic difference between the words. Based on this observation, we propose the idea of using text color for word embeddings. While text-only word embeddings (e.g. word2vec) have been extremely successful, they often represent antonyms as similar since they are often interchangeable in sentences. In this paper, we try two tasks to verify the usefulness of text color in understanding the meanings of words, especially in identifying synonyms and antonyms. First, we quantify the color distribution of words from the book cover images and analyze the correlation between the color and meaning of the word. Second, we try to retrain word embeddings with the color distribution of words as a constraint. By observing the changes in the word embeddings of synonyms and antonyms before and after re-training, we aim to understand the kind of words that have positive or negative effects in their word embeddings when incorporating text color information.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源