PromptMoE: A Segmentation Refinement Framework Leveraging Mixture of Experts for Improved Prompting
Abstract
High-quality segmentations are critical in vision tasks where boundary accuracy is important (e.g., medical diagnostics, quality control, etc.). Recently, promptable vision models have emerged as effective backbones for segmentation refinement frameworks. However, their performance not only hinges on prompt quality, they also must overcome noisy input masks and semantically ambiguous outputs from promptable models. Existing prompt-based refiners rely on fixed prompt rules, making them brittle to changing failure modes and new tasks or domains. We propose \MOE{}, a model-agnostic MoE-driven prompting refiner effective in segmentation refinement across tasks and domains. \MOE{} features three collaborative modules to refine an initial mask: our MoE-based Image-Informed Prompting framework (IIP) takes an image and coarse mask and produces a set of expert score maps to guide prompt generation, the Dynamic Expert Selector (DES) activates only the most relevant experts and fuses their maps to avoid dense evaluation and signal dilution, and the Prompt-Placement Explorer (PPE) explores the fused guidance map to place high-confidence spatially diverse point prompts. Across five benchmark datasets (BIG, VOC, DAVIS585, ECSSD, MSRA-B), \MOE{} achieves statistically significant gains over SOTA methods CascadePSP, SegRefiner, and SAMRefiner on semantic, instance, and salient tasks, with mean improvements of +6.24 IoU / +8.99 BIoU.