H$^{2}$A$^{2}$: Homogeneity-Aware and Heterogeneity-Aware Feature Perception for Unified Indoor 3D Object Detection
Tao Xie ⋅ Tao An ⋅ Feng Liu ⋅ Jin Wensheng ⋅ Zhengyu Li ⋅ lijun zhao ⋅ Ruifeng Li
Abstract
In this work, we observe that for indoor 3D object detection, fundamental geometric cues induce homogeneous spatial responses across scenes, whereas scene-specific structure yields heterogeneous signatures. However, existing detectors lack effective mechanisms to jointly extract and exploit such dual properties, which imposes inherent limitations on detection performance. Guided by this insight, we propose H$^2$A$^2$, a homogeneity-aware and heterogeneity-aware feature perception network for unified indoor 3D object detection under cross-scene training paradigms.Technically, we introduce a structural-feature-aware kernel selection (SF-KS) method, which encompasses three core components:(i) task-aware linear modulation, a channel-wise affine transformation that strengthens scene-structural feature representation; (ii) kernel weight selection strategy that integrates an offset validity prior to suppress non-informative cross-scene transfer while utilizing a structural consistency posterior to capture scene-homogeneous cues. and (iii) task-aware channel gating that suppresses scene-irrelevant feature responses. Overall, SF-KS enables the precise optimization of homogeneous features while specializing in scene-specific heterogeneous ones. In addition, to stabilize cross-scene optimization, we further introduce norm-based gradient homogenization (NGH) algorithm, which normalizes and dynamically reweights per-task gradient norms to mitigate conflicts and promote consistent updates. Extensive experiments on diverse indoor benchmarks show that H$^2$A$^2$ delivers consistent gains over strong baselines and improves cross-scene generalization.
Successful Page Load