facesà点菜：通过属性删除的文本到面

论文标题

facesà点菜：通过属性删除的文本到面

Faces à la Carte: Text-to-Face Generation via Attribute Disentanglement

论文作者

Wang, Tianren, Zhang, Teng, Lovell, Brian

论文摘要

文本到面（TTF）综合是一项具有挑战性的任务，具有多种计算机视觉应用的潜力。与文本对图像（TTI）综合任务相比，由于面部属性多样性以及对高维抽象自然语言的解析，面部的文本描述可能会更加复杂和详细。在本文中，我们提出了一种文本对话模型，该模型不仅以高分辨率（1024x1024）产生具有文本对图像一致性的图像，而且还可以输出多种不同的面孔，以涵盖自然方式的各种未指定的面部特征。通过微调多标签分类器和图像编码器，我们的模型获得了向量和图像嵌入，这些向量和图像嵌入式用于转换从正态分布采样的输入噪声矢量。之后，将转换后的噪声向量馈入预训练的高分辨率图像发生器中，以产生带有所需面部属性的一组面。我们将模型称为TTF-HD。实验结果表明，TTF-HD产生具有最先进性能的高质量面孔。

Text-to-Face (TTF) synthesis is a challenging task with great potential for diverse computer vision applications. Compared to Text-to-Image (TTI) synthesis tasks, the textual description of faces can be much more complicated and detailed due to the variety of facial attributes and the parsing of high dimensional abstract natural language. In this paper, we propose a Text-to-Face model that not only produces images in high resolution (1024x1024) with text-to-image consistency, but also outputs multiple diverse faces to cover a wide range of unspecified facial features in a natural way. By fine-tuning the multi-label classifier and image encoder, our model obtains the vectors and image embeddings which are used to transform the input noise vector sampled from the normal distribution. Afterwards, the transformed noise vector is fed into a pre-trained high-resolution image generator to produce a set of faces with the desired facial attributes. We refer to our model as TTF-HD. Experimental results show that TTF-HD generates high-quality faces with state-of-the-art performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题