场景形式：带有变压器的室内场景产生

论文标题

场景形式：带有变压器的室内场景产生

SceneFormer: Indoor Scene Generation with Transformers

论文作者

Wang, Xinpeng, Yeshwanth, Chandan, Nießner, Matthias

论文摘要

我们通过生成一系列对象以及它们的位置和方向在房间布局中进行的位置和方向来解决室内场景生成的任务。大型室内场景数据集使我们可以从用户设计的室内场景中提取模式，并根据这些模式生成新场景。现有方法依赖于这些场景的2D或3D外观，除了对象位置，并对对象之间的可能关系做出假设。相比之下，我们不使用任何外观信息，并使用变形金刚的自我注意机制隐式学习对象关系。我们表明，与以前的方法相比，我们的模型设计会导致具有相似或改善的现实主义水平的场景产生。我们的方法也很灵活，因为它不仅可以在房间布局上，而且可以在房间的文本描述上进行条件，仅使用变压器的交叉注意机制。我们的用户研究表明，我们生成的场景比最先进的Fastsynth场景分别在卧室和客厅场景的时间分别为53.9％和56.7％。同时，我们平均在1.48秒内生成一个场景，比FastSynth快20％。

We address the task of indoor scene generation by generating a sequence of objects, along with their locations and orientations conditioned on a room layout. Large-scale indoor scene datasets allow us to extract patterns from user-designed indoor scenes, and generate new scenes based on these patterns. Existing methods rely on the 2D or 3D appearance of these scenes in addition to object positions, and make assumptions about the possible relations between objects. In contrast, we do not use any appearance information, and implicitly learn object relations using the self-attention mechanism of transformers. We show that our model design leads to faster scene generation with similar or improved levels of realism compared to previous methods. Our method is also flexible, as it can be conditioned not only on the room layout but also on text descriptions of the room, using only the cross-attention mechanism of transformers. Our user study shows that our generated scenes are preferred to the state-of-the-art FastSynth scenes 53.9% and 56.7% of the time for bedroom and living room scenes, respectively. At the same time, we generate a scene in 1.48 seconds on average, 20% faster than FastSynth.

下载PDF全文

下载文献需遵守相关版权规定

论文标题