IVAAN: Instance-level Vision-Language Alignment via Attribute-Guided Text Prompts Generation for Nuclei Analysis
Abstract
Nuclei instance segmentation and classification are fundamental but remain challenging in pathology due to severe class imbalance and organ- and stain-induced variability. While vision–language approaches can inject explicit semantic cues that reduce spurious contextual bias under imbalance, the absence of instance level textual annotations has limited their utility for nucleus-level analysis. We introduce an instance-level vision–language framework that derives attribute-guided textual descriptions from ground-truth masks. We then align visual representations with these semantic text anchors via contrastive learning, coupling morphology with semantics at the instance level. To capture intra-class variations while maintaining organ-consistent class semantics, we learn multiple class-specific tokens that act as prototypes representing diverse submodes within a class, summarizing morphologically similar nuclei. Our approach improves both segmentation and classification without manual text labels, indicating that language-guided instance alignment combined with prototype-based semantic feedback yields more discriminative and generalizable nuclei representations.