论文标题
来自各种数据方式的人类行动识别:评论
Human Action Recognition from Various Data Modalities: A Review
论文作者
论文摘要
人类行动识别(HAR)旨在理解人类行为并为每个行动分配标签。它具有广泛的应用程序,因此在计算机视觉领域引起了越来越多的关注。可以使用各种数据模式来表示人类的行动,例如RGB,骨骼,深度,红外,点云,事件流,音频,加速度,雷达和WiFi信号,它们编码有用但不同信息的不同来源,并取决于应用程序方案。因此,许多现有作品都试图使用各种方式研究HAR的不同类型的方法。在本文中,我们根据输入数据模式的类型进行了对HAR深度学习方法的最新进展的全面调查。具体而言,我们回顾了单个数据模式和多种数据模式的当前主流深度学习方法,包括基于融合的基于融合和基于共学习的框架。我们还在HAR的几个基准数据集上提出了比较结果,以及有见地的观察结果并激发了未来的研究方向。
Human Action Recognition (HAR) aims to understand human behavior and assign a label to each action. It has a wide range of applications, and therefore has been attracting increasing attention in the field of computer vision. Human actions can be represented using various data modalities, such as RGB, skeleton, depth, infrared, point cloud, event stream, audio, acceleration, radar, and WiFi signal, which encode different sources of useful yet distinct information and have various advantages depending on the application scenarios. Consequently, lots of existing works have attempted to investigate different types of approaches for HAR using various modalities. In this paper, we present a comprehensive survey of recent progress in deep learning methods for HAR based on the type of input data modality. Specifically, we review the current mainstream deep learning methods for single data modalities and multiple data modalities, including the fusion-based and the co-learning-based frameworks. We also present comparative results on several benchmark datasets for HAR, together with insightful observations and inspiring future research directions.