步态：基于骨架的步态表示通过宽光谱多轴混合器学习

论文标题

步态：基于骨架的步态表示通过宽光谱多轴混合器学习

GaitMixer: Skeleton-based Gait Representation Learning via Wide-spectrum Multi-axial Mixer

论文作者

Pinyoanuntapong, Ekkasit, Ali, Ayman, Wang, Pu, Lee, Minwoo, Chen, Chen

论文摘要

大多数现有的步态识别方法都是基于外观的，它依赖于从人类步行活动的视频数据中提取的轮廓。基于骨骼的步态识别方法较少，直接从2D/3D人类骨骼序列中学习步态动力学，在理论上，在存在由衣服，发型和携带物体引起的外观变化的情况下，在理论上更强大的解决方案。但是，基于骨架的解决方案的性能仍然在很大程度上是基于外观的解决方案。本文旨在通过提出一种新型网络模型GaitMixer来缩小此类性能差距，以从骨架序列数据中学习更多歧视步态表示。尤其是步态效果遵循异质的多轴混合器结构，该体系结构利用了空间自我注意的混合器，然后是时间大型内核卷积混合器，以学习步态特征图中的丰富多频信号。广泛使用的步态数据库CASIA-B上的实验表明，步态蛋白的表现优于先前的基于SOTA骨架的方法，而与代表性的基于外观的解决方案相比，实现了竞争性能。代码将在https://github.com/exitudio/gaitmixer上找到

Most existing gait recognition methods are appearance-based, which rely on the silhouettes extracted from the video data of human walking activities. The less-investigated skeleton-based gait recognition methods directly learn the gait dynamics from 2D/3D human skeleton sequences, which are theoretically more robust solutions in the presence of appearance changes caused by clothes, hairstyles, and carrying objects. However, the performance of skeleton-based solutions is still largely behind the appearance-based ones. This paper aims to close such performance gap by proposing a novel network model, GaitMixer, to learn more discriminative gait representation from skeleton sequence data. In particular, GaitMixer follows a heterogeneous multi-axial mixer architecture, which exploits the spatial self-attention mixer followed by the temporal large-kernel convolution mixer to learn rich multi-frequency signals in the gait feature maps. Experiments on the widely used gait database, CASIA-B, demonstrate that GaitMixer outperforms the previous SOTA skeleton-based methods by a large margin while achieving a competitive performance compared with the representative appearance-based solutions. Code will be available at https://github.com/exitudio/gaitmixer

下载PDF全文

下载文献需遵守相关版权规定

论文标题