论文标题
重新访问神经机器翻译的检查点
Revisiting Checkpoint Averaging for Neural Machine Translation
论文作者
论文摘要
检查点平均是一种简单有效的方法,可以提高融合神经机器翻译模型的性能。该计算价格便宜,并且翻译改进几乎是免费的,这一事实使其在神经机器翻译研究中广泛采用。尽管很受欢迎,但该方法本身只是从几个检查点中获取模型参数的平均值,其选择主要基于经验食谱而没有许多理由。在这项工作中,我们重新审视了检查点的平均概念,并考虑了几个扩展。具体而言,我们尝试了诸如使用不同检查点选择策略的想法,计算加权平均值而不是简单的平均值,利用梯度信息并微调开发数据上的插值权重。我们的结果证实了必须使用检查点平均以进行最佳性能的必要性,但也表明,与简单的平均相比,收敛检查点之间的景观相当平坦,而与简单平均相比没有太大的改进。
Checkpoint averaging is a simple and effective method to boost the performance of converged neural machine translation models. The calculation is cheap to perform and the fact that the translation improvement almost comes for free, makes it widely adopted in neural machine translation research. Despite the popularity, the method itself simply takes the mean of the model parameters from several checkpoints, the selection of which is mostly based on empirical recipes without many justifications. In this work, we revisit the concept of checkpoint averaging and consider several extensions. Specifically, we experiment with ideas such as using different checkpoint selection strategies, calculating weighted average instead of simple mean, making use of gradient information and fine-tuning the interpolation weights on development data. Our results confirm the necessity of applying checkpoint averaging for optimal performance, but also suggest that the landscape between the converged checkpoints is rather flat and not much further improvement compared to simple averaging is to be obtained.