论文标题
学习模拟复杂的场景
Learning to simulate complex scenes
论文作者
论文摘要
像Unity这样的数据仿真引擎正成为越来越重要的数据源,使我们可以方便地获取地面真相标签。此外,我们可以灵活地编辑引擎中图像的内容,例如对象(位置,方向)和环境(照明,遮挡)。当使用模拟数据作为训练集时,可以利用其可编辑的内容来模仿现实世界数据的分布,从而减少合成域和真实域之间的内容差异。本文探讨了在语义分割的上下文中的内容适应,其中复杂的街道场景是从第一人称驱动程序的角度使用19类虚拟对象完全合成的,并由23个属性控制。为了优化属性值并获得与现实世界数据相似的内容的训练集,我们提出了一种可扩展的离散化和放松(SDR)方法。在强化学习框架下,我们使用神经网络将属性优化作为随机到优化的映射问题。我们的方法具有三个特征。 1)我们专注于对场景结构(例如对象密度和照明)具有很大影响的全局属性,而不是编辑单个对象的属性。 2)将属性量化为离散值,以降低搜索空间和训练复杂性。 3)相关属性在组中共同优化,以避免毫无意义的场景结构并找到更好的收敛点。实验表明,我们的系统可以生成合理且有用的场景,与现有的合成训练集相比,我们从中获得了有希望的现实分割精度。
Data simulation engines like Unity are becoming an increasingly important data source that allows us to acquire ground truth labels conveniently. Moreover, we can flexibly edit the content of an image in the engine, such as objects (position, orientation) and environments (illumination, occlusion). When using simulated data as training sets, its editable content can be leveraged to mimic the distribution of real-world data, and thus reduce the content difference between the synthetic and real domains. This paper explores content adaptation in the context of semantic segmentation, where the complex street scenes are fully synthesized using 19 classes of virtual objects from a first person driver perspective and controlled by 23 attributes. To optimize the attribute values and obtain a training set of similar content to real-world data, we propose a scalable discretization-and-relaxation (SDR) approach. Under a reinforcement learning framework, we formulate attribute optimization as a random-to-optimized mapping problem using a neural network. Our method has three characteristics. 1) Instead of editing attributes of individual objects, we focus on global attributes that have large influence on the scene structure, such as object density and illumination. 2) Attributes are quantized to discrete values, so as to reduce search space and training complexity. 3) Correlated attributes are jointly optimized in a group, so as to avoid meaningless scene structures and find better convergence points. Experiment shows our system can generate reasonable and useful scenes, from which we obtain promising real-world segmentation accuracy compared with existing synthetic training sets.