论文标题
语法引导的局部自我注意力通过选区句法距离
Syntax-guided Localized Self-attention by Constituency Syntactic Distance
论文作者
论文摘要
最近的工作表明,变形金刚从数据中隐含地学习了其下层中的句法信息,尽管高度依赖于培训数据的质量和规模。但是,如果我们可以利用外部句法解析器,从数据学习句法信息是不需要的,该解析器提供了具有明确定义的句法结构的更好解析质量。这可能会提高变压器的性能和样本效率。在这项工作中,我们提出了针对变压器的语法引导的局部自我注意,该自我注意力允许直接纳入外部选区解析器的语法结构。它禁止注意力机制超重,超重的令牌超过了近距离的令牌。实验结果表明,我们的模型可以始终如一地改善各种机器翻译数据集的翻译性能,范围从小到大数据集尺寸以及不同的源语言。
Recent works have revealed that Transformers are implicitly learning the syntactic information in its lower layers from data, albeit is highly dependent on the quality and scale of the training data. However, learning syntactic information from data is not necessary if we can leverage an external syntactic parser, which provides better parsing quality with well-defined syntactic structures. This could potentially improve Transformer's performance and sample efficiency. In this work, we propose a syntax-guided localized self-attention for Transformer that allows directly incorporating grammar structures from an external constituency parser. It prohibits the attention mechanism to overweight the grammatically distant tokens over close ones. Experimental results show that our model could consistently improve translation performance on a variety of machine translation datasets, ranging from small to large dataset sizes, and with different source languages.