从开放的空中数据集中，用于大规模多视图立体重建的新型复发编码器结构

论文标题

从开放的空中数据集中，用于大规模多视图立体重建的新型复发编码器结构

A Novel Recurrent Encoder-Decoder Structure for Large-Scale Multi-view Stereo Reconstruction from An Open Aerial Dataset

论文作者

Liu, Jin, Ji, Shunping

论文摘要

大量研究表明，多视图立体声（MV）匹配可以通过深度学习方法来解决。但是，这些努力集中在近距离对象上，由于缺乏多视图空中图像基准，仅专门针对大型3D城市重建而专门设计了基于深度学习的方法。在本文中，我们提出了一个称为WHU数据集的合成空中数据集，我们为MVS任务创建，据我们所知，该数据集是第一个大型多视图空中数据集。它是由具有精确摄像机参数的数千个真实航空图像产生的高度精确的3D数字表面模型生成的。我们还在本文中介绍了一个名为Red-Net的新颖网络，以进行大范围的深度推断，我们从经常性编码器码头结构开发，以将深度跨深度的成本图和2D完全卷积网络作为框架进行正规化。 Red-Net的低内存需求和高性能使其适用于大规模且高度准确的3D地面重建。我们的实验证实，我们的方法不仅超过了当前的最新MVS方法，超过50％的平均绝对误差（MAE），其内存和计算成本较小，而且其效率也更少。它的表现优于基于常规方法的最佳商业软件程序之一，提高了16次的效率。此外，我们证明了在合成WHU数据集上预先训练的红色网络模型可以有效地传输到非常不同的多视图空中图像数据集中，而无需进行任何微调。数据集可在http://gpcv.whu.edu.cn/data上找到。

A great deal of research has demonstrated recently that multi-view stereo (MVS) matching can be solved with deep learning methods. However, these efforts were focused on close-range objects and only a very few of the deep learning-based methods were specifically designed for large-scale 3D urban reconstruction due to the lack of multi-view aerial image benchmarks. In this paper, we present a synthetic aerial dataset, called the WHU dataset, we created for MVS tasks, which, to our knowledge, is the first large-scale multi-view aerial dataset. It was generated from a highly accurate 3D digital surface model produced from thousands of real aerial images with precise camera parameters. We also introduce in this paper a novel network, called RED-Net, for wide-range depth inference, which we developed from a recurrent encoder-decoder structure to regularize cost maps across depths and a 2D fully convolutional network as framework. RED-Net's low memory requirements and high performance make it suitable for large-scale and highly accurate 3D Earth surface reconstruction. Our experiments confirmed that not only did our method exceed the current state-of-the-art MVS methods by more than 50% mean absolute error (MAE) with less memory and computational cost, but its efficiency as well. It outperformed one of the best commercial software programs based on conventional methods, improving their efficiency 16 times over. Moreover, we proved that our RED-Net model pre-trained on the synthetic WHU dataset can be efficiently transferred to very different multi-view aerial image datasets without any fine-tuning. Dataset are available at http://gpcv.whu.edu.cn/data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题