Poster Sun, Jun 7, 2026 • 2:30 PM – 4:30 PM PDT ExHall A 85

Multi-Scale Gaussian-Language Map for Zero-shot Embodied Navigation and Reasoning

Sixian Zhang ⋅ Yiyao Wang ⋅ Xinhang Song ⋅ Keming Zhang ⋅ Zijian Xu ⋅ Shuqiang Jiang

Highlight

Paper PDF

Abstract

Understanding the geometric and semantic structure of environments is essential for embodied agents. Existing semantic mapping methods trade off between explicit geometry and multi-scale semantics,and lack a native interface for large models, thus requiring additional training of feature projection for semantic alignment. To this end, we propose the multi-scale Gaussian-Language Map (GLMap), which introduces three key designs: (1) explicit geometry, (2) multi-scale semantics covering both instance and region level concepts, and (3) a dual-modality interface where each semantic unit jointly stores a natural language description and a 3D Gaussian representation. The 3D Gaussians enable compact storage and fast rendering of task-relevant images via Gaussian splatting. To enable efficient incremental construction, we further propose a Gaussian Estimator that analytically derives Gaussian parameters from dense point clouds without gradient-based optimization. Experiments on ObjectNav, InstNav, and SQA tasks show that GLMap effectively enhances target localization and contextual reasoning, while remaining compatible with large-model-based methods in a zero-shot manner.