Poster
Domain Generalization in CLIP via Learning with Diverse Text Prompts
Changsong Wen · Zelin Peng · Yu Huang · Xiaokang Yang · Wei Shen
Domain generalization (DG) aims to train a model on source domains that can generalize well to unseen domains. Recent advances in Vision-Language Models (VLMs), such as CLIP, exhibit remarkable generalization capabilities across a wide range of data distributions, benefiting tasks like DG. However, CLIP is pre-trained by aligning images with their descriptions, which inevitably captures domain-specific details. Moreover, adapting CLIP to source domains with limited feature diversity introduces bias. These limitations hinder the model's ability to generalize across domains. In this paper, we propose a new DG approach by learning with diverse text prompts. These text prompts incorporate varied contexts to imitate different domains, enabling DG model to learn domain-invariant features. The text prompts guide DG model learning in three aspects: feature suppression, which uses these prompts to identify domain-sensitive features and suppress them; feature consistency, which ensures the model's features are robust to domain variations imitated by the diverse prompts; and feature diversification, which diversifies features based on the prompts to mitigate bias. Experimental results show that our approach improves domain generalization performance on five datasets on the DomainBed benchmark, achieving state-of-the-art results.
Live content is unavailable. Log in and register to view live content