视频全盘细分

论文标题

视频全盘细分

Video Panoptic Segmentation

论文作者

Kim, Dahun, Woo, Sanghyun, Lee, Joon-Young, Kweon, In So

论文摘要

通过统一先前的语义细分和实例分段任务，Panoptic分割已成为视觉识别任务的新标准。在本文中，我们提出并探讨了此任务的新视频扩展名，称为视频全景分割。该任务需要生成一致的全景进行分割以及视频帧之间实例ID的关联。为了激发有关这项新任务的研究，我们提出了两种类型的视频全景数据集。首先是将合成Viper数据集重新组织为视频全景格式，以利用其大规模的像素注释。第二个是CityScapes Val上的时间扩展。设置，提供新的视频全面注释（CityScapes-vps）。此外，我们提出了一个新颖的视频圆形分割网络（VPSNET），该网络共同预测视频帧中对象类，边界框，掩码，实例ID跟踪和语义分割。为了为此任务提供适当的指标，我们提出了视频综合质量（VPQ）指标，并评估我们的方法和其他几个基线。实验结果证明了两个数据集的有效性。我们在CityScapes以及CityScapes-VPS和VIPER数据集的VPQ上获得了最新的图像PQ结果。数据集和代码可公开可用。

Panoptic segmentation has become a new standard of visual recognition task by unifying previous semantic segmentation and instance segmentation tasks in concert. In this paper, we propose and explore a new video extension of this task, called video panoptic segmentation. The task requires generating consistent panoptic segmentation as well as an association of instance ids across video frames. To invigorate research on this new task, we present two types of video panoptic datasets. The first is a re-organization of the synthetic VIPER dataset into the video panoptic format to exploit its large-scale pixel annotations. The second is a temporal extension on the Cityscapes val. set, by providing new video panoptic annotations (Cityscapes-VPS). Moreover, we propose a novel video panoptic segmentation network (VPSNet) which jointly predicts object classes, bounding boxes, masks, instance id tracking, and semantic segmentation in video frames. To provide appropriate metrics for this task, we propose a video panoptic quality (VPQ) metric and evaluate our method and several other baselines. Experimental results demonstrate the effectiveness of the presented two datasets. We achieve state-of-the-art results in image PQ on Cityscapes and also in VPQ on Cityscapes-VPS and VIPER datasets. The datasets and code are made publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题