OneOcc: Semantic Occupancy Prediction for Legged Robots with a Single Panoramic Camera
Hao Shi ⋅ Ze Wang ⋅ Shangwei Guo ⋅ Mengfei Duan ⋅ Song Wang ⋅ Teng Chen ⋅ Kailun Yang ⋅ Lin Wang ⋅ Kaiwei Wang
Abstract
Robust 3D semantic occupancy is essential for legged and humanoid robots, yet most Semantic Scene Completion (SSC) systems are built for wheeled platforms with forward-facing sensors. We present $\textbf{OneOcc}$, a vision-only panoramic SSC framework tailored to severe body jitter and $360^{\circ}$ continuity. OneOcc integrates four complementary modules: (i) $\textit{Dual-Projection fusion (DP-ER)}$, which jointly exploits the raw annular panorama and its equirectangular unfolding to preserve true $360^{\circ}$ continuity while enabling grid-aligned feature extraction and seam-aware context; (ii) $\textit{Bi-Grid Voxelization (BGV)}$, which reasons in Cartesian and polar/cylindrical voxel spaces to reduce discretization bias and better align with panoramic geometry, yielding sharper free/occupied boundaries; (iii) a lightweight decoder with $\textit{Hierarchical AMoE-3D}$ fusion that dynamically routes multi-scale 3D features to specialized experts, improving long-range context and occlusion handling; and (iv) a plug-and-play $\textit{Gait Displacement Compensation (GDC)}$ module that learns feature-level motion correction from gait, stabilizing representations without extra sensors. We also release two panoramic occupancy benchmarks: $\textbf{QuadOcc}$ (real quadruped, first-person $360^{\circ}$) and $\textbf{Human360Occ (H3O)}$ (CARLA human-ego $360^{\circ}$ with RGB/Depth/semantic-occupancy and standardized within-/cross-city splits). OneOcc sets new SOTA: on QuadOcc it exceeds strong vision baselines and even popular LiDAR methods, and on H3O it improves within-city by +3.83 mIoU and cross-city by +8.08. The modules are lightweight, enabling deployable full-surround semantic perception for legged and humanoid robots. Datasets and code will be released upon publication.
Successful Page Load