论文标题
在辩证法阿拉伯语UGT中更好地翻译情感的半监督方法
A Semi-supervised Approach for a Better Translation of Sentiment in Dialectical Arabic UGT
论文作者
论文摘要
在在线世界中,机器翻译(MT)系统被广泛用于翻译用户生成的文本(UGT),例如评论,推文和社交媒体帖子,其中主要信息通常是作者对文本主题的正面或负面态度。但是,MT系统仍然缺乏某些低资源语言的准确性,有时会出现关键的翻译错误,从而使目标单词或短语的情感极性完全翻转,因此传达了错误的影响信息。这在不遵循常见的词典语法标准(例如在线平台上使用的辩证法阿拉伯语(DA))的文本中尤其明显。在这项研究中,我们旨在改善用阿拉伯语辩证版本为英语的UGT中情感的翻译。考虑到UGT域中DA-EN的金标准并联数据的稀缺性,我们引入了一种半监督的方法,该方法利用单语言和并行数据来训练由通过受到监督和不受监督的建模目标训练的跨语言模型初始化的NMT系统。我们通过数字“情感 - 宽松度”措施以及人类评估来评估我们提议的系统的情感翻译准确性。我们将证明,我们的半监督MT系统可以极大地帮助纠正辩证法阿拉伯语UGT的在线翻译中检测到的情绪错误。
In the online world, Machine Translation (MT) systems are extensively used to translate User-Generated Text (UGT) such as reviews, tweets, and social media posts, where the main message is often the author's positive or negative attitude towards the topic of the text. However, MT systems still lack accuracy in some low-resource languages and sometimes make critical translation errors that completely flip the sentiment polarity of the target word or phrase and hence delivers a wrong affect message. This is particularly noticeable in texts that do not follow common lexico-grammatical standards such as the dialectical Arabic (DA) used on online platforms. In this research, we aim to improve the translation of sentiment in UGT written in the dialectical versions of the Arabic language to English. Given the scarcity of gold-standard parallel data for DA-EN in the UGT domain, we introduce a semi-supervised approach that exploits both monolingual and parallel data for training an NMT system initialised by a cross-lingual language model trained with supervised and unsupervised modeling objectives. We assess the accuracy of sentiment translation by our proposed system through a numerical 'sentiment-closeness' measure as well as human evaluation. We will show that our semi-supervised MT system can significantly help with correcting sentiment errors detected in the online translation of dialectical Arabic UGT.