论文标题
从变压器的顺序到序列的角度重新访问立体声深度估计
Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective with Transformers
论文作者
论文摘要
立体深度估计依赖于左图和右图像上的外两极线上的像素之间的最佳对应匹配以推断深度。在这项工作中,我们使用位置信息和注意力从序列到序列的对应关系的角度重新审视问题,以替换成本量的构造。这种称为立体变压器(STTR)的方法具有多个优点:1)放松固定差异范围的限制,2)识别遮挡区域并提供置信度估计,3)在匹配过程中施加独特的约束。我们报告了合成和现实世界数据集的有希望的结果,并证明即使没有微调,STTR也会在不同域中概括。
Stereo depth estimation relies on optimal correspondence matching between pixels on epipolar lines in the left and right images to infer depth. In this work, we revisit the problem from a sequence-to-sequence correspondence perspective to replace cost volume construction with dense pixel matching using position information and attention. This approach, named STereo TRansformer (STTR), has several advantages: It 1) relaxes the limitation of a fixed disparity range, 2) identifies occluded regions and provides confidence estimates, and 3) imposes uniqueness constraints during the matching process. We report promising results on both synthetic and real-world datasets and demonstrate that STTR generalizes across different domains, even without fine-tuning.