论文标题

Epic-Kitchens遮阳板基准:视频分割和对象关系

EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations

论文作者

Darkhalil, Ahmad, Shan, Dandan, Zhu, Bin, Ma, Jian, Kar, Amlan, Higgins, Richard, Fidler, Sanja, Fouhey, David, Damen, Dima

论文摘要

我们介绍了遮阳板,一个新的像素注释的数据集和一个基准套件,用于在以自我为中心的视频中分割手和活动对象。遮阳板注释来自Epic-Kitchens的视频,其中带有当前视频分割数据集中未遇到的一系列挑战。具体而言,我们需要确保像素级注释作为对象经历变革性相互作用的短期和长期一致性,例如洋葱被剥皮,切成丁和煮熟 - 我们旨在获得果皮,洋葱块,切碎板,刀,锅以及表演手的准确像素级注释。遮阳板引入了一条注释管道,以零件为ai驱动,以延伸性和质量。总共,我们公开发布257个对象类的272K手册语义面具,990万个插值密集口罩,67K手动关系,涵盖了179个未修剪的视频的36小时。除了注释外,我们还引入了视频对象细分,互动理解和长期推理方面的三个挑战。 有关数据,代码和排行榜:http://epic-kitchens.github.io/visor

We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality. In total, we publicly release 272K manual semantic masks of 257 object classes, 9.9M interpolated dense masks, 67K hand-object relations, covering 36 hours of 179 untrimmed videos. Along with the annotations, we introduce three challenges in video object segmentation, interaction understanding and long-term reasoning. For data, code and leaderboards: http://epic-kitchens.github.io/VISOR

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源