论文标题
跟踪对象作为点
Tracking Objects as Points
论文作者
论文摘要
传统上,跟踪是通过空间和时间关注兴趣点的艺术。随着强大的深网的兴起,这发生了变化。如今,跟踪由执行对象检测的管道主导,然后是时间关联,也称为逐个检测。在本文中,我们提出了一种同时检测和跟踪算法,该算法比最清晰的状态更简单,更快,更准确。我们的跟踪器CenterTrack将检测模型应用于一对图像和从先前框架的检测。鉴于此最小输入,中心轨道将对象定位并预测其与先前帧的关联。就是这样。 CenterTrack很简单,在线(不窥视未来)和实时。它在22 fps的MOT17挑战赛上获得了67.3%的MOTA,而Kitti Tracking Benchmark的MOTA在15 fps的基准上获得了67.3%的MOTA,在两个数据集上都设定了新的最新技术。通过回归其他3D属性,可以轻松地将CenterTrack扩展到单眼3D跟踪。使用单眼视频输入,它在新发布的Nuscenes 3D跟踪基准上实现了28.3%的[email protected],在28 fps运行时,在该基准测试的基线上的表现显着优于单眼基线。
Tracking has traditionally been the art of following interest points through space and time. This changed with the rise of powerful deep networks. Nowadays, tracking is dominated by pipelines that perform object detection followed by temporal association, also known as tracking-by-detection. In this paper, we present a simultaneous detection and tracking algorithm that is simpler, faster, and more accurate than the state of the art. Our tracker, CenterTrack, applies a detection model to a pair of images and detections from the prior frame. Given this minimal input, CenterTrack localizes objects and predicts their associations with the previous frame. That's it. CenterTrack is simple, online (no peeking into the future), and real-time. It achieves 67.3% MOTA on the MOT17 challenge at 22 FPS and 89.4% MOTA on the KITTI tracking benchmark at 15 FPS, setting a new state of the art on both datasets. CenterTrack is easily extended to monocular 3D tracking by regressing additional 3D attributes. Using monocular video input, it achieves 28.3% [email protected] on the newly released nuScenes 3D tracking benchmark, substantially outperforming the monocular baseline on this benchmark while running at 28 FPS.