Robo-SGG: Exploiting Layout-Oriented Normalization and Restitution Can Improve Robust Scene Graph Generation
Abstract
In this paper, we propose Robo-SGG, a plug-and-play module for robust scene graph generation (SGG). Unlike standard SGG, the robust scene graph generation aims to perform inference on a diverse range of corrupted images, with the core challenge being the domain shift between the clean and corrupted images. Existing SGG methods suffer from degraded performance due to shifted visual features (e.g., corruption interference or occlusions). To obtain robust visual features, we leverage layout information, representing the global structure of an image, which is robust to domain shift, to enhance the robustness of SGG methods under corruption. Specifically, we employ Instance Normalization (IN) to alleviate the domain-specific variations and recover the robust structural features (i.e., the positional and semantic relationships among objects) by the proposed Layout-Oriented Restitution. Furthermore, under corrupted images, we introduce a Layout-Embedded Encoder (LEE) that adaptively fuses layout and visual features via a gating mechanism, enhancing the robustness of positional and semantic representations for objects and predicates. Note that our proposed Robo-SGG module is designed as a plug-and-play component, which can be easily integrated into any baseline SGG model. Extensive experiments demonstrate that by integrating the state-of-the-art method into our proposed Robo-SGG, we achieve relative improvements of 6.3%, 11.1%, and 8.0% in mR@50 for PredCls, SGCls, and SGDet tasks on the VG-C benchmark, respectively, and achieve new state-of-the-art performance in the corruption scene graph generation benchmark (VG-C and GQA-C). We will release our source code and model.