稀疏而连续的注意机制

论文标题

稀疏而连续的注意机制

Sparse and Continuous Attention Mechanisms

论文作者

Martins, André F. T., Farinhas, António, Treviso, Marcos, Niculae, Vlad, Aguiar, Pedro M. Q., Figueiredo, Mário A. T.

论文摘要

指数家庭被广泛用于机器学习；它们在连续和离散的域中包括许多分布（例如，通过软马克斯转换，高斯，迪里奇，泊松和分类分布）。这些家庭中的每个家庭的分布都有固定的支持。相比之下，对于有限域而言，最新的工作在SoftMax的稀疏替代方案（例如Sparsemax和Alpha-entmax）方面具有不同的支持，能够将零概率分配给无关类别。本文扩展了两个方向的工作：首先，我们将alpha-entmax扩展到连续域，揭示了与Tsallis统计数据和变形指数族的联系。其次，我们引入了连续的注意机制，从{1,2}中推导了α的有效梯度反向传播算法。基于注意力的文本分类，机器翻译和视觉问题的实验回答说明了在1D和2D中连续注意的使用，表明它允许参加时间间隔和紧凑区域。

Exponential families are widely used in machine learning; they include many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, there has been recent work on sparse alternatives to softmax (e.g. sparsemax and alpha-entmax), which have varying support, being able to assign zero probability to irrelevant categories. This paper expands that work in two directions: first, we extend alpha-entmax to continuous domains, revealing a link with Tsallis statistics and deformed exponential families. Second, we introduce continuous-domain attention mechanisms, deriving efficient gradient backpropagation algorithms for alpha in {1,2}. Experiments on attention-based text classification, machine translation, and visual question answering illustrate the use of continuous attention in 1D and 2D, showing that it allows attending to time intervals and compact regions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题