论文标题
Ecoformer:线性复杂性的节能注意力
EcoFormer: Energy-Saving Attention with Linear Complexity
论文作者
论文摘要
变压器是一个变革性框架,可以对顺序数据进行建模,并在各种任务上取得了出色的性能,但具有高计算和能源成本。为了提高其效率,一个流行的选择是通过二进制化压缩模型,将浮点值限制为二进制值,以节省资源消耗,这是由于廉价的钻头操作而大大减少了资源。但是,现有的二进制方法仅旨在最大程度地减少输入分布的信息损失,同时忽略注意力核心的成对相似性建模。为此,我们提出了一个新的二进制范式,通过二维软式散发范式通过甲壳化的哈希(称为Ecoformer)定制为高维软磁性,以将原始查询和键映射到锤式空间中的低维二进制代码中。学会了内核化的哈希函数,以匹配从注意图中提取的基本相似性关系。基于二进制代码的内部乘积与锤距离距离以及矩阵乘法的关联特性之间的等效性,我们可以通过将其表示为二进制代码的点产量来近似线性复杂性中的注意力。此外,查询和密钥的紧凑型二进制表示使我们能够用简单的积累来代替大多数昂贵的多重收益操作,以节省边缘设备上的大量芯片能量足迹。关于视觉和语言任务的广泛实验表明,生态成像始终可以与标准注意力相当,同时消耗更少的资源。例如,与标准注意相比,基于PVTV2-B0和Imagenet-1K,EcoFormer可实现73%的片上能量足迹降低,性能下降仅为0.33%。代码可从https://github.com/ziplab/ecoformer获得。
Transformer is a transformative framework that models sequential data and has achieved remarkable performance on a wide range of tasks, but with high computational and energy cost. To improve its efficiency, a popular choice is to compress the models via binarization which constrains the floating-point values into binary ones to save resource consumption owing to cheap bitwise operations significantly. However, existing binarization methods only aim at minimizing the information loss for the input distribution statistically, while ignoring the pairwise similarity modeling at the core of the attention. To this end, we propose a new binarization paradigm customized to high-dimensional softmax attention via kernelized hashing, called EcoFormer, to map the original queries and keys into low-dimensional binary codes in Hamming space. The kernelized hash functions are learned to match the ground-truth similarity relations extracted from the attention map in a self-supervised way. Based on the equivalence between the inner product of binary codes and the Hamming distance as well as the associative property of matrix multiplication, we can approximate the attention in linear complexity by expressing it as a dot-product of binary codes. Moreover, the compact binary representations of queries and keys enable us to replace most of the expensive multiply-accumulate operations in attention with simple accumulations to save considerable on-chip energy footprint on edge devices. Extensive experiments on both vision and language tasks show that EcoFormer consistently achieves comparable performance with standard attentions while consuming much fewer resources. For example, based on PVTv2-B0 and ImageNet-1K, Ecoformer achieves a 73% on-chip energy footprint reduction with only a 0.33% performance drop compared to the standard attention. Code is available at https://github.com/ziplab/EcoFormer.