STAR: Test-Time Adaptation Can Enhance Universal Prompt Learning for Vision-Language Models
Abstract
This paper studies the problem of universal test-time prompt learning for vision-language models (VLMs) which aims to enhance prompt learning for a pre-trained VLM via unlabeled target data containing out-of-distribution (OOD) samples. However, existing test-time adaptation approaches often overlook class-specific diversity in the target domain and rely on unreliable pseudo-labels due to inadequate uncertainty estimation, which may result in additional adaptation bias during test time. Towards this end, we propose a novel framework named Separability-aware Conjugate Optimization with Prototypical Retrieval (STAR) for universal test-time prompt learning of VLMs. The core of our STAR is to incorporate a separability-aware gating mechanism into conjugate optimization for reliable pseudo-learning with OOD samples. In particular, we first compute the Fisher score to quantify the separability between in-distribution (ID) and OOD samples, which guides our soft gating mechanism for divided training. Then, we employ conjugate optimization to derive reliable pseudo-labels of unlabeled data for test-time adaptation. To further mitigate biases in OOD detection, we maintain a dynamic memory bank which stores high-confidence samples to build class-wise prototypes, which would serve as queries for prototypical retrieval to calibrate OOD detection. Extensive experiments on multiple benchmarks demonstrate that STAR consistently outperforms competing baseline methods.