论文标题
使用与3D CNN的残留框架的运动表示
Motion Representation Using Residual Frames with 3D CNN
论文作者
论文摘要
最近,3D卷积网络(3D Convnets)在行动识别中产生良好的性能。但是,仍然需要光流以确保更好的性能,其成本非常高。在本文中,我们提出了一种快速但有效的方法,以利用残留帧作为3D Convnets中的输入数据从视频中提取运动功能。通过用残留的RGB框架替换传统的堆叠RGB框架,可以在RESNET-18型号的scratch训练时,可以在UCF101和HMDB51数据集上获得35.6%和26.6%的分数。我们在这种训练模式下实现了最先进的结果。分析表明,与RGB对应物相比,可以使用残留框架提取更好的运动功能。通过与简单的外观路径结合使用,我们的建议可以比使用光流流的某些方法更好。
Recently, 3D convolutional networks (3D ConvNets) yield good performance in action recognition. However, optical flow stream is still needed to ensure better performance, the cost of which is very high. In this paper, we propose a fast but effective way to extract motion features from videos utilizing residual frames as the input data in 3D ConvNets. By replacing traditional stacked RGB frames with residual ones, 35.6% and 26.6% points improvements over top-1 accuracy can be obtained on the UCF101 and HMDB51 datasets when ResNet-18 models are trained from scratch. And we achieved the state-of-the-art results in this training mode. Analysis shows that better motion features can be extracted using residual frames compared to RGB counterpart. By combining with a simple appearance path, our proposal can be even better than some methods using optical flow streams.