Gau-Occ: Geometry-Completed Gaussians for Multi-Modal 3D Occupancy Prediction
Abstract
3D semantic occupancy prediction is crucial for autonomous driving, yet vision-only approaches suffer from weak geometric cues, and existing multi-modal frameworks often depend on dense voxel or BEV tensors that impose heavy computational cost. We present Gau-Occ, a multi-modal framework that models the scene as a compact collection of semantic 3D Gaussians, enabling geometry-guided fusion without dense volumetric processing.To enhance geometric completeness, a learned LiDAR Completion Diffuser (LCD) trained on real-world priors recovers missing structures from sparse LiDAR, and the completed points are encoded as semantic Gaussian anchors.To further integrate multi-view image semantics, we introduce Gaussian Anchor Fusion (GAF), a geometry-aligned aggregation module that performs anchor-guided 2D sampling, local neighborhood encoding, and cross-modal alignment. By constructing locally aggregated Gaussian descriptors that capture spatial consistency and semantic discriminability, GAF facilitates accurate feature association across modalities.Through anchor-driven refinement of Gaussian attributes, Occ-GS supports detailed 3D occupancy prediction. Extensive experiments across challenging benchmarks demonstrate that Occ-GS achieves state-of-the-art performance.