Skip to yearly menu bar Skip to main content


Poster

ViKIENet: Towards Efficient 3D Object Detection with Virtual Key Instance Enhanced Network

Zhuochen Yu · Bijie Qiu · Andy W. H. Khong


Abstract:

The sparsity of point clouds poses challenges to current LiDAR-only 3D object detection methods. Recently, methods that convert RGB images into virtual points via depth completion to be fused with LiDAR points have alleviated this issue. Although these methods can achieve outstanding results, they often introduce significant computation overhead due to the high density of virtual points and noises due to inevitable errors in depth completion. At the same time, they do not fully leverage the semantic information from images. In this paper, we propose ViKIENet (Virtual Key Instance Enhanced Network), a highly efficient and effective multi-modal feature fusion framework which fuses the features of virtual key instances (VKIs) with those of LiDAR points in multiple stages. We observed that using only VKIs can enhance the detection performance similar to using all virtual points. ViKIENet has three main components: Semantic Key Instance Selection (SKIS), Virtual Instance Focused Fusion (VIFF) and Virtual-Instance-to-Real Attention (VIRA). ViKIENet-R and VIFF-R are extended versions of ViKIENet and VIFF that include rotationally equivariant features. ViKIENet and ViKIENet-R achieve considerable improvements in detection performance on the KITTI, JRDB and nuScenes datasets. On the KITTI dataset, ViKIENet and ViKIENet-R run fast at 22.7 and 15.0 FPS respectively. We rank first on the KITTI car object detection and orientation estimation evaluation leaderboard and rank second on the car 3D object detection leaderboard among published papers.

Live content is unavailable. Log in and register to view live content