论文标题
Wembsim:图像字幕的简单而有效的指标
WEmbSim: A Simple yet Effective Metric for Image Captioning
论文作者
论文摘要
自动图像标题评估的领域仍在进行大量研究中,以满足可以满足足够和流利性要求的字幕的需求。基于我们过去试图开发高度复杂的基于学习的指标,我们发现使用字幕的单词嵌入式(MOWE)的平均值实际上可以实现无处可比的字幕评估中的高性能。这激发了我们在有效的度量Wembsim上提议的工作,该措施在系统级与人类判断的相关性时击败了香料,苹果酒和WMD等复杂措施。此外,它还可以与通常使用的无监督方法相匹配字幕对的人体共识分数,达到最佳准确性。因此,我们认为Wembsim为任何复杂的指标设定了一个新的基准。
The area of automatic image caption evaluation is still undergoing intensive research to address the needs of generating captions which can meet adequacy and fluency requirements. Based on our past attempts at developing highly sophisticated learning-based metrics, we have discovered that a simple cosine similarity measure using the Mean of Word Embeddings(MOWE) of captions can actually achieve a surprisingly high performance on unsupervised caption evaluation. This inspires our proposed work on an effective metric WEmbSim, which beats complex measures such as SPICE, CIDEr and WMD at system-level correlation with human judgments. Moreover, it also achieves the best accuracy at matching human consensus scores for caption pairs, against commonly used unsupervised methods. Therefore, we believe that WEmbSim sets a new baseline for any complex metric to be justified.