论文标题
联接链网络:变压器中多头注意的逻辑推理观点
Join-Chain Network: A Logical Reasoning View of the Multi-head Attention in Transformer
论文作者
论文摘要
开发能够进行逻辑推理的神经体系结构对于广泛的应用(例如自然语言处理)变得越来越重要。为了实现这一宏伟的目标,我们提出了一个象征性推理体系结构,该体系结构将许多操作员融合在一起,以模拟输出逻辑表达式。特别是,我们证明了这种结合链的合奏可以表达“树结构”一阶逻辑表达式的广泛子集,名为Foet,这对于对自然语言进行建模特别有用。为了赋予其可区分的学习能力,我们仔细检查了各种神经操作员,以近似象征性的联接链。有趣的是,我们发现变压器中广泛使用的多头自我发项模块可以理解为特殊的神经操作员,它在概率谓词空间中实现了联接操作员的联合界限。我们的分析不仅提供了有关自然语言理解伯特(Bert)等验证模型机制的新观点,而且还提出了一些重要的未来改进方向。
Developing neural architectures that are capable of logical reasoning has become increasingly important for a wide range of applications (e.g., natural language processing). Towards this grand objective, we propose a symbolic reasoning architecture that chains many join operators together to model output logical expressions. In particular, we demonstrate that such an ensemble of join-chains can express a broad subset of ''tree-structured'' first-order logical expressions, named FOET, which is particularly useful for modeling natural languages. To endow it with differentiable learning capability, we closely examine various neural operators for approximating the symbolic join-chains. Interestingly, we find that the widely used multi-head self-attention module in transformer can be understood as a special neural operator that implements the union bound of the join operator in probabilistic predicate space. Our analysis not only provides a new perspective on the mechanism of the pretrained models such as BERT for natural language understanding but also suggests several important future improvement directions.