Frequency-Aware Affinity for Weakly Supervised Semantic Segmentation
Abstract
Weakly Supervised Semantic Segmentation (WSSS) typically utilizes Class Activation Maps (CAMs) to provide the pixel-wise localization. However, CAMs tend to activate only the most discriminative regions, leading to suboptimal WSSS performance. Although existing CAM refinement methods leverage pair-wise relations in affinity to expand the activation regions, these affinities derived from Vision Transformer (ViTs) exhibit a smoothing property, neglecting crucial high-frequency relations and failing to accurately refine object boundaries. In this work, we propose the Dual Frequency-Aware framework (DFA) to address this limitation. Specifically, the Low-Frequency-Aware Alignment (LFAA) generates low-frequency-aware affinity that captures salient semantic relations to enhance object interior semantic consistency on CAMs, while the High-Frequency-Aware Rectification (HFAR) module produces high-frequency-aware affinity that models precise relations to preserve object boundary structure on CAMs. By effectively integrating these two complementary affinities, we design a novel Frequency-Guided (FG) CAM Generation based on Optimal Transport theory, which significantly omits the complex refinement process. Extensive experiments demonstrate that our DFA framework achieves state-of-the-art performance on both PASCAL VOC and MS COCO benchmarks. Code will be released.