关于基于学习的3D重建的概括

论文标题

关于基于学习的3D重建的概括

On the generalization of learning-based 3D reconstruction

论文作者

Bautista, Miguel Angel, Talbott, Walter, Zhai, Shuangfei, Srivastava, Nitish, Susskind, Joshua M

论文摘要

基于最新的学习单程3D重建方法在训练集上学习对象类别的先验，因此，在培训期间对对象类别进行合理的概括而难以实现对象类别的合理概括。在本文中，我们研究了在模型结构中编码的归纳偏差，这些偏见会影响基于学习的3D重建方法的概括。我们发现3个归纳偏见会影响性能：编码器的空间范围，使用场景的基础几何形状来描述点特征，以及从多个视图中汇总信息的机制。此外，我们提出了实施这些归纳偏见的机制：意识到相机位置的点表示，以及跨视图汇总信息的差异成本。我们的模型在各种设置中的标准Shapenet 3D重建基准中实现了最先进的结果。

State-of-the-art learning-based monocular 3D reconstruction methods learn priors over object categories on the training set, and as a result struggle to achieve reasonable generalization to object categories unseen during training. In this paper we study the inductive biases encoded in the model architecture that impact the generalization of learning-based 3D reconstruction methods. We find that 3 inductive biases impact performance: the spatial extent of the encoder, the use of the underlying geometry of the scene to describe point features, and the mechanism to aggregate information from multiple views. Additionally, we propose mechanisms to enforce those inductive biases: a point representation that is aware of camera position, and a variance cost to aggregate information across views. Our model achieves state-of-the-art results on the standard ShapeNet 3D reconstruction benchmark in various settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题