论文标题
LCEVAL:学到的用于标题评估的复合指标
LCEval: Learned Composite Metric for Caption Evaluation
论文作者
论文摘要
自动评估指标在字幕系统的开发和细粒度分析中至关重要。尽管当前的评估指标倾向于在系统级别上与人类判断达到可接受的相关性,但它们在标题级别未能做到这一点。在这项工作中,我们提出了一个基于神经网络的学识指标,以改善字幕级标题评估。为了更深入了解影响学到的指标性能的参数,本文研究了不同语言特征与学习指标的字幕级相关性之间的关系。我们还比较了接受不同培训示例培训的指标,以衡量其评估的变化。此外,我们进行了鲁棒性分析,该分析强调了对各种句子扰动的学习和手工制作的指标的敏感性。我们的经验分析表明,我们提出的指标不仅在字幕级相关性方面胜过现有的指标,而且还显示了针对人类评估的系统级相关性。
Automatic evaluation metrics hold a fundamental importance in the development and fine-grained analysis of captioning systems. While current evaluation metrics tend to achieve an acceptable correlation with human judgements at the system level, they fail to do so at the caption level. In this work, we propose a neural network-based learned metric to improve the caption-level caption evaluation. To get a deeper insight into the parameters which impact a learned metrics performance, this paper investigates the relationship between different linguistic features and the caption-level correlation of the learned metrics. We also compare metrics trained with different training examples to measure the variations in their evaluation. Moreover, we perform a robustness analysis, which highlights the sensitivity of learned and handcrafted metrics to various sentence perturbations. Our empirical analysis shows that our proposed metric not only outperforms the existing metrics in terms of caption-level correlation but it also shows a strong system-level correlation against human assessments.