设置相互依存的变压器：用于排列学习和结构预测的设置到序列神经网络

论文标题

设置相互依存的变压器：用于排列学习和结构预测的设置到序列神经网络

Set Interdependence Transformer: Set-to-Sequence Neural Networks for Permutation Learning and Structure Prediction

论文作者

Jurewicz, Mateusz, Derczynski, Leon

论文摘要

学习将输入集映射到其元素的序列上的任务对于神经网络而言是一项挑战。在自然语言处理，计算机视觉和结构预测中发生了设定的问题，其中大集合元素之间的相互作用定义了最佳输出。模型必须表现出关系推理，处理不同的基础性并管理组合复杂性。以前的基于注意力的方法需要$ n $层的设定转换，以明确表示$ n $ th订单关系。我们的目的是增强他们通过附加相互依赖分量有效地对高阶相互作用进行有效建模的能力。我们提出了一种新型的神经集编码方法，称为集合相互依赖变压器，能够将集合的排列不变表示与其在任何基数集合中的元素联系起来。我们将其与置换学习模块结合到一个完整的三部分设定模型中，并在许多任务上演示其最先进的性能。这些范围从组合优化问题，到在合成和已建立的NLP数据集上的置换学习挑战到句子排序的挑战，到产品目录结构预测的新颖领域。此外，研究了网络概括到看不见的序列长度的能力，并提供了现有方法学习高阶相互作用能力的比较经验分析。

The task of learning to map an input set onto a permuted sequence of its elements is challenging for neural networks. Set-to-sequence problems occur in natural language processing, computer vision and structure prediction, where interactions between elements of large sets define the optimal output. Models must exhibit relational reasoning, handle varying cardinalities and manage combinatorial complexity. Previous attention-based methods require $n$ layers of their set transformations to explicitly represent $n$-th order relations. Our aim is to enhance their ability to efficiently model higher-order interactions through an additional interdependence component. We propose a novel neural set encoding method called the Set Interdependence Transformer, capable of relating the set's permutation invariant representation to its elements within sets of any cardinality. We combine it with a permutation learning module into a complete, 3-part set-to-sequence model and demonstrate its state-of-the-art performance on a number of tasks. These range from combinatorial optimization problems, through permutation learning challenges on both synthetic and established NLP datasets for sentence ordering, to a novel domain of product catalog structure prediction. Additionally, the network's ability to generalize to unseen sequence lengths is investigated and a comparative empirical analysis of the existing methods' ability to learn higher-order interactions is provided.

下载PDF全文

下载文献需遵守相关版权规定

论文标题