论文标题
深度单眼估计与基于注意的编码器 - 模块网络来自单个图像
Depth Monocular Estimation with Attention-based Encoder-Decoder Network from Single Image
论文作者
论文摘要
深度信息是感知的基础,对于自动驾驶,机器人技术和其他源限制的应用程序至关重要。迅速获得准确有效的深度信息,可以在动态环境中快速响应。使用LIDAR和雷达的基于传感器的方法以高功耗,价格和数量为代价获得了高精度。尽管由于深度学习的进步,但基于视觉的方法最近引起了很多关注,并可以克服这些缺点。在这项工作中,我们探讨了基于视觉设置的极端情况:估算一个深度图,从一个单眼图像严重困扰着网格伪像和模糊的边缘。为了解决这种情况,我们首先设计了一个卷积注意机构块(CAMB),该块由渠道的注意力和空间注意力依次组成,并将这些凸轮插入跳过连接中。结果,我们的新方法可以找到最小的开销的当前图像的焦点,并避免深度特征的损失。接下来,通过结合深度值,X轴,Y轴和对角线方向的梯度以及结构相似性指数度量(SSIM),我们提出了新的损失函数。此外,我们利用像素块来加速损耗函数的计算。最后,我们通过在两个大规模图像数据集(即Kitti和Nyu-V2)上进行的全面实验表明,我们的方法的表现优于几个代表性基线。
Depth information is the foundation of perception, essential for autonomous driving, robotics, and other source-constrained applications. Promptly obtaining accurate and efficient depth information allows for a rapid response in dynamic environments. Sensor-based methods using LIDAR and RADAR obtain high precision at the cost of high power consumption, price, and volume. While due to advances in deep learning, vision-based approaches have recently received much attention and can overcome these drawbacks. In this work, we explore an extreme scenario in vision-based settings: estimate a depth map from one monocular image severely plagued by grid artifacts and blurry edges. To address this scenario, We first design a convolutional attention mechanism block (CAMB) which consists of channel attention and spatial attention sequentially and insert these CAMBs into skip connections. As a result, our novel approach can find the focus of current image with minimal overhead and avoid losses of depth features. Next, by combining the depth value, the gradients of X axis, Y axis and diagonal directions, and the structural similarity index measure (SSIM), we propose our novel loss function. Moreover, we utilize pixel blocks to accelerate the computation of the loss function. Finally, we show, through comprehensive experiments on two large-scale image datasets, i.e. KITTI and NYU-V2, that our method outperforms several representative baselines.