用于基于骨架的动作识别的姿势引导的图形卷积网络

论文标题

用于基于骨架的动作识别的姿势引导的图形卷积网络

Pose-Guided Graph Convolutional Networks for Skeleton-Based Action Recognition

论文作者

Chen, Han, Jiang, Yifan, Ko, Hanseok

论文摘要

图形卷积网络（GCN）可以将人体骨骼作为空间和时间图建模，在基于骨架的动作识别中显示出巨大的潜力。但是，在现有的基于GCN的方法中，人类骨架的图形结构表示使得很难与其他方式融合，尤其是在早期阶段。这可能会限制其在行动识别任务中的可扩展性和性能。此外，很少与现有方法中的骨骼数据一起探索姿势信息，这些信息自然包含了供行动识别的信息和歧视性线索。在这项工作中，我们提出了姿势引导的GCN（PG-GCN），这是一种用于高性能人类行动识别的多模式框架。特别是，构建了一个多流网络，以同时探索姿势和骨架数据的稳健特征，而动态注意模块则设计用于早期特征融合。该模块的核心思想是利用可训练的图表与姿势流的骨架流集合特征，这导致具有更强大特征表示能力的网络。广泛的实验表明，所提出的PG-GCN可以在NTU RGB+D 60和NTU RGB+D 120数据集上实现最先进的性能。

Graph convolutional networks (GCNs), which can model the human body skeletons as spatial and temporal graphs, have shown remarkable potential in skeleton-based action recognition. However, in the existing GCN-based methods, graph-structured representation of the human skeleton makes it difficult to be fused with other modalities, especially in the early stages. This may limit their scalability and performance in action recognition tasks. In addition, the pose information, which naturally contains informative and discriminative clues for action recognition, is rarely explored together with skeleton data in existing methods. In this work, we propose pose-guided GCN (PG-GCN), a multi-modal framework for high-performance human action recognition. In particular, a multi-stream network is constructed to simultaneously explore the robust features from both the pose and skeleton data, while a dynamic attention module is designed for early-stage feature fusion. The core idea of this module is to utilize a trainable graph to aggregate features from the skeleton stream with that of the pose stream, which leads to a network with more robust feature representation ability. Extensive experiments show that the proposed PG-GCN can achieve state-of-the-art performance on the NTU RGB+D 60 and NTU RGB+D 120 datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题