论文标题
APT-36K:用于动物姿势估计和跟踪的大规模基准
APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking
论文作者
论文摘要
动物姿势估计和跟踪(APT)是从一系列视频帧中检测和跟踪动物关键的基本任务。以前与动物相关的数据集专注于动物跟踪或单帧动物姿势估计,而从未在这两个方面上进行。缺乏APT数据集阻碍了基于视频的动物姿势估计和跟踪方法的开发和评估,限制了现实世界中的应用,例如了解野生动物保护中的动物行为。为了填补这一空白,我们迈出了第一步,并提出了APT-36K,即第一个用于动物姿势估计和跟踪的大规模基准。具体而言,APT-36K由2400个视频剪辑组成,并从30种动物物种中收集并过滤,每个视频中有15帧,总共36,000帧。在手动注释和仔细的双重检查之后,为所有动物实例提供了高质量的关键点和跟踪注释。基于APT-36K,我们在以下三个轨道上基准了几种代表性模型:(1)在内部和域间转移学习设置下,在单个框架上进行监督的动物姿势估计,(2)对未见动物的种间域概括测试,以及(3)具有动物跟踪的动物姿势估计。根据实验结果,我们获得了一些经验见解,并表明APT-36K提供了有价值的动物姿势估计和跟踪基准,为未来的研究提供了新的挑战和机会。该代码和数据集将在https://github.com/pandorgan/apt-36k上公开提供。
Animal pose estimation and tracking (APT) is a fundamental task for detecting and tracking animal keypoints from a sequence of video frames. Previous animal-related datasets focus either on animal tracking or single-frame animal pose estimation, and never on both aspects. The lack of APT datasets hinders the development and evaluation of video-based animal pose estimation and tracking methods, limiting real-world applications, e.g., understanding animal behavior in wildlife conservation. To fill this gap, we make the first step and propose APT-36K, i.e., the first large-scale benchmark for animal pose estimation and tracking. Specifically, APT-36K consists of 2,400 video clips collected and filtered from 30 animal species with 15 frames for each video, resulting in 36,000 frames in total. After manual annotation and careful double-check, high-quality keypoint and tracking annotations are provided for all the animal instances. Based on APT-36K, we benchmark several representative models on the following three tracks: (1) supervised animal pose estimation on a single frame under intra- and inter-domain transfer learning settings, (2) inter-species domain generalization test for unseen animals, and (3) animal pose estimation with animal tracking. Based on the experimental results, we gain some empirical insights and show that APT-36K provides a valuable animal pose estimation and tracking benchmark, offering new challenges and opportunities for future research. The code and dataset will be made publicly available at https://github.com/pandorgan/APT-36K.