通过扩散模型的表示学习

论文标题

通过扩散模型的表示学习

Representation Learning with Diffusion Models

论文作者

Traub, Jeremias

论文摘要

扩散模型（DMS）已获得图像合成任务以及密度估计的最新结果。在强大的经过预告片的自动编码器（LDM）的潜在空间中应用，它们的巨大计算要求可以大大降低，而无需牺牲采样质量。但是，随着扩散过程逐渐破坏潜在变量的信息，DMS和LDM缺乏语义上有意义的表示空间。我们介绍了一个通过扩散模型（LRDM）学习此类表示的框架。为此，LDM在由单独的编码器中从干净的图像中提取的表示上进行调节。特别是，DM和表示编码者是共同训练的，以学习特定于生成降解过程的丰富表示形式。通过提交可拖动的表示，我们可以从无条件图像合成的表示分布中有效地采样，而无需训练任何其他模型。我们证明i）可以通过图像参数化的LDM来实现竞争性图像生成结果，ii）LRDMS能够学习语义上有意义的表示，从而允许忠实的图像重建和语义插值。我们的实施可从https://github.com/jeremiastraub/diffusion获得。

Diffusion models (DMs) have achieved state-of-the-art results for image synthesis tasks as well as density estimation. Applied in the latent space of a powerful pretrained autoencoder (LDM), their immense computational requirements can be significantly reduced without sacrificing sampling quality. However, DMs and LDMs lack a semantically meaningful representation space as the diffusion process gradually destroys information in the latent variables. We introduce a framework for learning such representations with diffusion models (LRDM). To that end, a LDM is conditioned on the representation extracted from the clean image by a separate encoder. In particular, the DM and the representation encoder are trained jointly in order to learn rich representations specific to the generative denoising process. By introducing a tractable representation prior, we can efficiently sample from the representation distribution for unconditional image synthesis without training of any additional model. We demonstrate that i) competitive image generation results can be achieved with image-parameterized LDMs, ii) LRDMs are capable of learning semantically meaningful representations, allowing for faithful image reconstructions and semantic interpolations. Our implementation is available at https://github.com/jeremiastraub/diffusion.

下载PDF全文

下载文献需遵守相关版权规定

论文标题