Skip to yearly menu bar Skip to main content


Poster

Rethinking Temporal Fusion with A Unified Gradient Descent View for 3D Semantic Occupancy Prediction

Dubing Chen · Huan Zheng · Jin Fang · Xingping Dong · Xianfei Li · Wenlong Liao · Tao He · Pai Peng · Jianbing Shen


Abstract:

We present GDFusion, a temporal fusion method for vision-based 3D semantic occupancy prediction (VisionOcc). GDFusion opens up the underexplored aspects of temporal fusion within the VisionOcc framework, with a focus on both temporal cues and fusion strategies. It systematically examines the entire VisionOcc pipeline, identifying three fundamental yet previously overlooked temporal cues: scene-level consistencies, motion calibration, and geometric complementation. These cues capture diverse facets of temporal evolution and provide distinctive contributions across various modules in the general VisionOcc framework.To effectively fuse temporal signals of different representations, we introduce a novel fusion strategy by reinterpreting vanilla RNNs. This approach utilizes gradient descent on features to unify the integration of diverse temporal information. Extensive experiments on NuScenes demonstrate that GDFusion significantly outperforms established baselines, delivering a consistent increase in mIoU between 2.2\% to 4.7\% with less memory consumption. Codes will be made publicly available.

Live content is unavailable. Log in and register to view live content