论文标题

建筑时空变压器以Egentric 3D姿势估计

Building Spatio-temporal Transformers for Egocentric 3D Pose Estimation

论文作者

Park, Jinman, Kaai, Kimathi, Hossain, Saad, Sumi, Norikatsu, Rambhatla, Sirisha, Fieguth, Paul

论文摘要

由于严重的自我估计和Fish-eye视图从头部安装的摄像头引起的强烈自我变形,图像中的Egentric 3D人类姿势估计(HPE)具有挑战性。尽管现有作品使用基于中等热图的表示以抵消扭曲的成功,但解决自我概括仍然是一个空旷的问题。在这项工作中,我们利用过去框架的信息来指导我们基于自我注意的3D HPE估计程序-Ego-Stan。具体而言,我们构建了一个时空变压器模型,该模型会介绍基于语义上丰富的卷积神经网络特征图。我们还提出了功能地图令牌:一组新的可学习参数,可以参与这些特征映射。最后,我们在XR-Egopose数据集上展示了Ego-Stan的出色性能,在XR-Egopose数据集上,它对总平均每个接头位置误差提高了30.6%,而与最先进的ART相比,参数下降了22%。

Egocentric 3D human pose estimation (HPE) from images is challenging due to severe self-occlusions and strong distortion introduced by the fish-eye view from the head mounted camera. Although existing works use intermediate heatmap-based representations to counter distortion with some success, addressing self-occlusion remains an open problem. In this work, we leverage information from past frames to guide our self-attention-based 3D HPE estimation procedure -- Ego-STAN. Specifically, we build a spatio-temporal Transformer model that attends to semantically rich convolutional neural network-based feature maps. We also propose feature map tokens: a new set of learnable parameters to attend to these feature maps. Finally, we demonstrate Ego-STAN's superior performance on the xR-EgoPose dataset where it achieves a 30.6% improvement on the overall mean per-joint position error, while leading to a 22% drop in parameters compared to the state-of-the-art.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源