Discover, Segment, and Select: A Progressive Mechanism for Zero-shot Camouflaged Object Segmentation
Abstract
Current zero-shot camouflaged object segmentation methods typically employ a two-stage pipeline (discover-then-segment): using MLLMs to obtain visual prompts, followed by SAM segmentation. However, relying solely on MLLMs for camouflaged object discovery often leads to inaccurate localization, false positives, and missed detections. To address these issues, we propose the Discover-Segment-Select (DSS) mechanism, a three-stage framework that progressively refines the segmentation process. The proposed method contains a Feature-driven Object Discovery (FOD) module that leverages visual features to generate diverse object proposals, a segmentation module that refines these proposals through SAM segmentation, and a Semantic-driven Mask Selection (SMS) module that employs MLLMs to evaluate and select the optimal segmentation mask from multiple candidates. Extensive experiments on four benchmarks demonstrate that our method achieves state-of-the-art performance with lower GPU memory consumption.