论文标题

ACE:翻译精度挑战集用于评估机器翻译指标

ACES: Translation Accuracy Challenge Sets for Evaluating Machine Translation Metrics

论文作者

Amrhein, Chantal, Moghe, Nikita, Guillou, Liane

论文摘要

随着机器翻译(MT)指标每年都会改善与人类判断的相关性,因此,了解该分段级别的这种指标的局限性至关重要。具体而言,在面对MT的准确性错误时,研究指标行为很重要,因为在某些情况下这些情况可能会造成危险的后果(例如,法律,医疗)。我们策划了Aces,这是一个翻译精度挑战集,由68个现象组成,范围从单词/角色层面的简单扰动到基于话语和现实世界知识的更复杂的错误。我们使用ACE评估广泛的MT指标,包括对WMT 2022指标共享任务的提交,并执行多个分析,从而为公制开发人员提出一般建议。我们建议:a)将指标与不同优势的指标相结合,b)开发指标,这些指标可以使源具有更大的权重,而与参考的表面级别重叠更少,并且c)明确对其他语言特定信息进行建模,而不是通过多语言嵌入式提供的内容。

As machine translation (MT) metrics improve their correlation with human judgement every year, it is crucial to understand the limitations of such metrics at the segment level. Specifically, it is important to investigate metric behaviour when facing accuracy errors in MT because these can have dangerous consequences in certain contexts (e.g., legal, medical). We curate ACES, a translation accuracy challenge set, consisting of 68 phenomena ranging from simple perturbations at the word/character level to more complex errors based on discourse and real-world knowledge. We use ACES to evaluate a wide range of MT metrics including the submissions to the WMT 2022 metrics shared task and perform several analyses leading to general recommendations for metric developers. We recommend: a) combining metrics with different strengths, b) developing metrics that give more weight to the source and less to surface-level overlap with the reference and c) explicitly modelling additional language-specific information beyond what is available via multilingual embeddings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源