论文标题
Simple-Bev:对多传感器BEV感知真正重要的是什么?
Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?
论文作者
论文摘要
为不依赖高密度激光雷达的自动驾驶汽车建造3D感知系统是一个关键的研究问题,因为与摄像机和其他传感器相比,LIDAR系统的费用。最近的研究开发了多种仅相机方法,其中特征从多摄像机图像上脱离到2D接地平面上,从而产生了“鸟眼视图”(BEV)的特征代表车辆周围的3D空间。这项工作已经产生了各种新颖的“提升”方法,但是我们观察到培训设置中的其他细节同时也发生了变化,因此不清楚在表现最佳的方法中真正重要的是什么。我们还观察到,仅使用摄像头不是现实世界中的约束,因为考虑到雷达等其他传感器已经整合到了真实车辆中多年了。在本文中,我们首先尝试阐明BEV感知模型的设计和培训方案中的高影响因素。我们发现,批处理大小和输入分辨率极大地影响了性能,而提升策略的效果更为适中 - 即使是简单的无参数升降机也可以很好地效果。其次,我们证明雷达数据可以大大提高性能,从而有助于缩小启用摄像头和启用激光镜头系统之间的差距。我们分析了导致良好性能的雷达使用细节,并邀请社区重新考虑传感器平台的这个常见部分。
Building 3D perception systems for autonomous vehicles that do not rely on high-density LiDAR is a critical research problem because of the expense of LiDAR systems compared to cameras and other sensors. Recent research has developed a variety of camera-only methods, where features are differentiably "lifted" from the multi-camera images onto the 2D ground plane, yielding a "bird's eye view" (BEV) feature representation of the 3D space around the vehicle. This line of work has produced a variety of novel "lifting" methods, but we observe that other details in the training setups have shifted at the same time, making it unclear what really matters in top-performing methods. We also observe that using cameras alone is not a real-world constraint, considering that additional sensors like radar have been integrated into real vehicles for years already. In this paper, we first of all attempt to elucidate the high-impact factors in the design and training protocol of BEV perception models. We find that batch size and input resolution greatly affect performance, while lifting strategies have a more modest effect -- even a simple parameter-free lifter works well. Second, we demonstrate that radar data can provide a substantial boost to performance, helping to close the gap between camera-only and LiDAR-enabled systems. We analyze the radar usage details that lead to good performance, and invite the community to re-consider this commonly-neglected part of the sensor platform.