Poster
Task-Aware Clustering for Prompting Vision-Language Models
Fusheng Hao · Fengxiang He · Fuxiang Wu · Tichao Wang · Chengqun Song · Jun Cheng
Prompt learning has attracted widespread attention in adapting vision-language models to downstream tasks. Existing methods largely rely on optimization strategies to ensure the task-awareness of learnable prompts. Due to the scarcity of task-specific data, overfitting is prone to occur. The resulting prompts often do not generalize well or exhibit limited task-awareness. To address this issue, we propose a novel Task-Aware Clustering (TAC) framework for prompting vision-language models, which increases the task-awareness of learnable prompts by introducing task-aware pre-context. The key ingredients are as follows: (a) generating task-aware pre-context based on task-aware clustering that can preserve the backbone structure of a downstream task with only a few clustering centers, (b) enhancing the task-awareness of learnable prompts by enabling them to interact with task-aware pre-context via the well-pretrained encoders, and (c) preventing the visual task-aware pre-context from interfering the interaction between patch embeddings by masked attention mechanism. Extensive experiments are conducted on benchmark datasets, covering the base-to-novel, domain generalization, and cross-dataset transfer settings. Ablation studies validate the effectiveness of key ingredients. Comparative results show the superiority of our TAC over competitive counterparts. The code will be made publicly available.
Live content is unavailable. Log in and register to view live content