使用与3D CNN的残留框架的运动表示

论文标题

使用与3D CNN的残留框架的运动表示

Motion Representation Using Residual Frames with 3D CNN

论文作者

Tao, Li, Wang, Xueting, Yamasaki, Toshihiko

论文摘要

最近，3D卷积网络（3D Convnets）在行动识别中产生良好的性能。但是，仍然需要光流以确保更好的性能，其成本非常高。在本文中，我们提出了一种快速但有效的方法，以利用残留帧作为3D Convnets中的输入数据从视频中提取运动功能。通过用残留的RGB框架替换传统的堆叠RGB框架，可以在RESNET-18型号的scratch训练时，可以在UCF101和HMDB51数据集上获得35.6％和26.6％的分数。我们在这种训练模式下实现了最先进的结果。分析表明，与RGB对应物相比，可以使用残留框架提取更好的运动功能。通过与简单的外观路径结合使用，我们的建议可以比使用光流流的某些方法更好。

Recently, 3D convolutional networks (3D ConvNets) yield good performance in action recognition. However, optical flow stream is still needed to ensure better performance, the cost of which is very high. In this paper, we propose a fast but effective way to extract motion features from videos utilizing residual frames as the input data in 3D ConvNets. By replacing traditional stacked RGB frames with residual ones, 35.6% and 26.6% points improvements over top-1 accuracy can be obtained on the UCF101 and HMDB51 datasets when ResNet-18 models are trained from scratch. And we achieved the state-of-the-art results in this training mode. Analysis shows that better motion features can be extracted using residual frames compared to RGB counterpart. By combining with a simple appearance path, our proposal can be even better than some methods using optical flow streams.

下载PDF全文

下载文献需遵守相关版权规定

论文标题