视觉提示通过图像介入

论文标题

视觉提示通过图像介入

Visual Prompting via Image Inpainting

论文作者

Bar, Amir, Gandelsman, Yossi, Darrell, Trevor, Globerson, Amir, Efros, Alexei A.

论文摘要

一个人如何在没有特定于任务的列表或任何模型修改的情况下将预训练的视觉模型调整为新颖的下游任务？受NLP提示的启发，本文研究了视觉提示：在测试时间和新输入图像时给定的输入输出图像示例示例，目标是自动生成输出图像，与给定示例一致。我们表明，将这个问题作为简单的图像插入，实际上只是填充了串联的视觉提示图像中的一个孔 - 只要已经对正确的数据训练了介绍算法，就会出奇地有效。我们在我们策划的新数据集上训练蒙面的自动编码器-88K未标记的数字来自ARXIV上的学术报纸来源。我们将视觉提示应用于这些预处理的模型，并在各种下游图像到图像任务上演示结果，包括前景分割，单个对象检测，着色，边缘检测等。

How does one adapt a pre-trained visual model to novel downstream tasks without task-specific finetuning or any model modification? Inspired by prompting in NLP, this paper investigates visual prompting: given input-output image example(s) of a new task at test time and a new input image, the goal is to automatically produce the output image, consistent with the given examples. We show that posing this problem as simple image inpainting - literally just filling in a hole in a concatenated visual prompt image - turns out to be surprisingly effective, provided that the inpainting algorithm has been trained on the right data. We train masked auto-encoders on a new dataset that we curated - 88k unlabeled figures from academic papers sources on Arxiv. We apply visual prompting to these pretrained models and demonstrate results on various downstream image-to-image tasks, including foreground segmentation, single object detection, colorization, edge detection, etc.

下载PDF全文

下载文献需遵守相关版权规定

论文标题