Poster Sun, Jun 7, 2026 • 10:45 AM – 12:45 PM PDT ExHall F 210

More than the Sum: Panorama-Language Models for Adverse Omni-Scenes

Weijia Fan ⋅ Ruiping Liu ⋅ Jiale Wei ⋅ Yufan Chen ⋅ Junwei Zheng ⋅ Zichao Zeng ⋅ Jiaming Zhang ⋅ Qiufu Li ⋅ Linlin Shen ⋅ Rainer Stiefelhagen

Abstract

Most existing vision-language models (VLMs) are tailored for pinhole imagery, stitching multiple narrow field-of-view inputs to piece together a complete omni-scene understanding. Yet, such multi-view perception overlooks the holistic spatial and contextual relationships that a single panorama inherently preserves. In this work, we propose that panoramic vision-language understanding is more than the sum of its pinhole counterparts. We introduce Panorama-Language Modeling (PLM), a unified 360° visual-language reasoning. Besides, we present PanoVQA, a large-scale panoramic VQA dataset that integrates diverse and adverse omni-scenes, enabling comprehensive reasoning under occlusion, accidents, and challenging conditions. To establish a foundation for PLM, we develop a plug-and-play panoramic adaptation module that allows existing pinhole-based VLMs to process equirectangular panoramas without retraining. Extensive experiments demonstrate that our PLM achieves superior robustness and holistic reasoning under adverse omni-scenes, revealing that a full panorama yields understanding greater than the sum of its parts. All datasets and code will be publicly released.