长期视觉定位的自我监督的特征学习

论文标题

长期视觉定位的自我监督的特征学习

Self-Supervised Feature Learning for Long-Term Metric Visual Localization

论文作者

Chen, Yuxuan, Barfoot, Timothy D.

论文摘要

视觉定位是在已知场景中估算摄像头姿势的任务，这是机器人技术和计算机视觉中的重要问题。但是，由于照明和季节引起的环境外观变化，长期视觉定位仍然是一个挑战。尽管存在使用神经网络来解决外观变化的技术，但这些方法通常需要地面真实姿势信息来生成准确的图像对应关系或在训练过程中充当监督信号。在本文中，我们提出了一个新颖的自我监督特征学习框架，用于度量视觉定位。我们在不同的图像序列（即体验）上使用基于序列的图像匹配算法来生成没有地面真实标签的图像对应关系。然后，我们可以采样图像对来训练深层神经网络，该网络通过相关的描述符和分数学习稀疏特征，而无需地面真相姿势监督。学习的功能可以与经典的姿势估计器一起用于视觉立体定位。我们通过与现有的视觉教学和重复管道集成以在不同的照明条件下进行闭环定位实验来验证学习的功能，总计22.4公里。

Visual localization is the task of estimating camera pose in a known scene, which is an essential problem in robotics and computer vision. However, long-term visual localization is still a challenge due to the environmental appearance changes caused by lighting and seasons. While techniques exist to address appearance changes using neural networks, these methods typically require ground-truth pose information to generate accurate image correspondences or act as a supervisory signal during training. In this paper, we present a novel self-supervised feature learning framework for metric visual localization. We use a sequence-based image matching algorithm across different sequences of images (i.e., experiences) to generate image correspondences without ground-truth labels. We can then sample image pairs to train a deep neural network that learns sparse features with associated descriptors and scores without ground-truth pose supervision. The learned features can be used together with a classical pose estimator for visual stereo localization. We validate the learned features by integrating with an existing Visual Teach & Repeat pipeline to perform closed-loop localization experiments under different lighting conditions for a total of 22.4 km.

下载PDF全文

下载文献需遵守相关版权规定

论文标题