探索蒙版自动编码器的目标表示形式

论文标题

探索蒙版自动编码器的目标表示形式

Exploring Target Representations for Masked Autoencoders

论文作者

Liu, Xingbin, Zhou, Jinghao, Kong, Tao, Lin, Xianming, Ji, Rongrong

论文摘要

蒙面的自动编码器已成为自我监督的视觉表示学习的流行培训范例。这些模型随机掩盖了输入的一部分，并根据目标表示形式重建蒙版部分。在本文中，我们首先表明，对目标表示形式的仔细选择对于学习良好的表示不需要，因为不同的目标倾向于得出相似的模型。在这一观察结果的驱动下，我们提出了一条多阶段掩盖的蒸馏管道，并使用随机初始化的模型作为老师，使我们能够有效地训练高容量模型，而无需仔细设计目标表示形式。有趣的是，我们进一步探索了具有较大能力的教师，获得具有出色转移能力的蒸馏学生。在不同的分类，转移学习，对象检测和语义分割的任务上，提出的方法是用自举的教师（DBOT）通过非平凡的边缘胜过以前的自我监督方法。我们希望我们的发现以及拟议的方法可以激励人们重新考虑目标表示在预训练的蒙版自动编码器中的作用。该代码和预培训模型可在https://github.com/liuxingbin/dbot上公开获得。

Masked autoencoders have become popular training paradigms for self-supervised visual representation learning. These models randomly mask a portion of the input and reconstruct the masked portion according to the target representations. In this paper, we first show that a careful choice of the target representation is unnecessary for learning good representations, since different targets tend to derive similarly behaved models. Driven by this observation, we propose a multi-stage masked distillation pipeline and use a randomly initialized model as the teacher, enabling us to effectively train high-capacity models without any efforts to carefully design target representations. Interestingly, we further explore using teachers of larger capacity, obtaining distilled students with remarkable transferring ability. On different tasks of classification, transfer learning, object detection, and semantic segmentation, the proposed method to perform masked knowledge distillation with bootstrapped teachers (dBOT) outperforms previous self-supervised methods by nontrivial margins. We hope our findings, as well as the proposed method, could motivate people to rethink the roles of target representations in pre-training masked autoencoders.The code and pre-trained models are publicly available at https://github.com/liuxingbin/dbot.

下载PDF全文

下载文献需遵守相关版权规定

论文标题