探索文档级文学机器翻译，并带有世界文学的平行段落

论文标题

探索文档级文学机器翻译，并带有世界文学的平行段落

Exploring Document-Level Literary Machine Translation with Parallel Paragraphs from World Literature

论文作者

Thai, Katherine, Karpinska, Marzena, Krishna, Kalpesh, Ray, Bill, Inghilleri, Moira, Wieting, John, Iyyer, Mohit

论文摘要

文学翻译是一项文化上重要的任务，但相对于世界各地发表的许多未翻译作品，少数合格的文学翻译者所瓶颈。机器翻译（MT）具有通过提高培训程序及其整体效率来补充人类翻译人员的工作。与传统的MT设置相比，文学翻译的限制较小，因为翻译人员必须在目标语言中平衡含义的等效性，可读性和关键的解释性。该属性以及文学文本中存在的复杂话语级别的上下文也使文学MT对计算模型和评估更具挑战性。为了探索这项任务，我们收集了公共领域中非英语语言小说的数据集（PAR3），每个小说都在段落级别与人类和自动英语翻译保持一致。使用PAR3，我们发现专家文学翻译人员更喜欢人类翻译而不是机器翻译的段落，速度为84％，而最先进的自动MT指标与这些偏好无关。专家指出，MT输出不仅包含误导，还包含话语中断的错误和风格上的不一致之处。为了解决这些问题，我们培训了一个编辑后的模型，该模型比专家以69％的速度优先于正常MT输出。我们在https://github.com/katherinethai/par3/上公开发布PAR3，以激发未来的文学研究。

Literary translation is a culturally significant task, but it is bottlenecked by the small number of qualified literary translators relative to the many untranslated works published around the world. Machine translation (MT) holds potential to complement the work of human translators by improving both training procedures and their overall efficiency. Literary translation is less constrained than more traditional MT settings since translators must balance meaning equivalence, readability, and critical interpretability in the target language. This property, along with the complex discourse-level context present in literary texts, also makes literary MT more challenging to computationally model and evaluate. To explore this task, we collect a dataset (Par3) of non-English language novels in the public domain, each aligned at the paragraph level to both human and automatic English translations. Using Par3, we discover that expert literary translators prefer reference human translations over machine-translated paragraphs at a rate of 84%, while state-of-the-art automatic MT metrics do not correlate with those preferences. The experts note that MT outputs contain not only mistranslations, but also discourse-disrupting errors and stylistic inconsistencies. To address these problems, we train a post-editing model whose output is preferred over normal MT output at a rate of 69% by experts. We publicly release Par3 at https://github.com/katherinethai/par3/ to spur future research into literary MT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题