FlowDIS: Language-Guided Dichotomous Image Segmentation with Flow Matching
Andranik Sargsyan ⋅ Shant Navasardyan
Abstract
Accurate image segmentation is essential for modern computer vision applications such as image editing, autonomous driving, and medical analysis. In recent years, Dichotomous Image Segmentation (DIS) has become the standard task for training and evaluating highly accurate segmentation models. Existing DIS approaches often fail to preserve fine-grained details or fully capture the semantic structure of the foreground.To address these challenges, we present $\textbf{FlowDIS}$, a novel dichotomous image segmentation method built upon the flow matching framework, which learns a time-dependent vector field to transport the image distribution into the corresponding mask distribution under optional textual guidance.Moreover, with our $\textbf{Position-Aware Instance Pairing (PAIP)}$ training strategy, FlowDIS offers strong controllability through textual prompts, enabling precise, pixel-level object segmentation.Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches both with and without language guidance. Compared to the second best DIS method, FlowDIS achieves $\textbf{5.5}$% $\textbf{higher $F_\beta^\omega$}$ measure and $\textbf{43}$% $\textbf{better MAE}$ ($\mathcal{M})$ on DIS-TE test set.The code will be released upon publication.
Successful Page Load