论文标题
适用于时空动作定位的演员 - 替代角色关系网络
Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization
论文作者
论文摘要
本地化的人并从视频中认识他们的行动是针对高级视频理解的一项艰巨的任务。通过对实体之间的直接成对关系进行建模,实现了最新进展。在本文中,我们进一步迈出了一步,不仅建模对之间的直接关系,而且还要考虑到在多个元素上建立的间接高阶关系。我们建议明确对参与者 - 陶器的关系进行建模,这是两个参与者基于他们与上下文的相互作用之间的关系。为此,我们设计了一个Actor-Context-Actor关系网络(ACAR-NET),该网络基于新型的高阶关系推理操作员和Actor-Context功能库,以实现空间行动定位的间接关系推理。关于AVA和UCF101-24数据集的实验显示了建模参与者 - 陶器关系的优势,并且注意力图的可视化进一步验证了我们的模型能够找到相关的高阶关系以支持动作检测。值得注意的是,我们的方法在2020年活动网络挑战的AVA-KineticsAction本地化任务中排名第一,超出其他条目的差距(+6.71映射)。培训代码和模型将在https://github.com/siyu-c/acar-net上找到。
Localizing persons and recognizing their actions from videos is a challenging task towards high-level video understanding. Recent advances have been achieved by modeling direct pairwise relations between entities. In this paper, we take one step further, not only model direct relations between pairs but also take into account indirect higher-order relations established upon multiple elements. We propose to explicitly model the Actor-Context-Actor Relation, which is the relation between two actors based on their interactions with the context. To this end, we design an Actor-Context-Actor Relation Network (ACAR-Net) which builds upon a novel High-order Relation Reasoning Operator and an Actor-Context Feature Bank to enable indirect relation reasoning for spatio-temporal action localization. Experiments on AVA and UCF101-24 datasets show the advantages of modeling actor-context-actor relations, and visualization of attention maps further verifies that our model is capable of finding relevant higher-order relations to support action detection. Notably, our method ranks first in the AVA-Kineticsaction localization task of ActivityNet Challenge 2020, out-performing other entries by a significant margin (+6.71mAP). Training code and models will be available at https://github.com/Siyu-C/ACAR-Net.