Dynamic Logits Adjustment and Exploration for Test-Time Adaptation in Vision Language Models
Abstract
Existing Test-Time Adaptation (TTA) methods for Vision-Language Models (VLMs), focusing on designing efficient adaptation parameters (eg. prompts or residual prototypes), predominantly rely on high-confidence samples obtained via entropy-based filtering. However, this prevailing paradigm implicitly inherits the VLM’s class-wise prediction biases and leads to insufficient coverage of the test distribution, rendering the adaptation process biased and insufficiently exploratory.To overcome these limitations, we propose Dynamic Logits Adjustment and Exploration (DLAE), a novel framework that integrates Dynamic Logit Adjustment (DLA) with a Consistency-Guided Exploratory Cache (CGEC). DLA dynamically recalibrates model logits based on test prediction statistics, thereby mitigating class-wise prediction inconsistencies. Different from traditional cache mechanisms, our CGEC actively identifies additional samples near decision boundaries whose predicted labels are sensitive to the logit adjustment, thereby exploring beyond only high-confidence samples. By enforcing semantic and temporal consistency, the cache preserves the reliability of selected samples while enabling cautious yet effective exploration of low-confidence regions, ultimately yielding stable and reliable adaptation.Extensive experiments across multiple vision-language benchmarks demonstrate that our approach consistently surpasses state-of-the-art TTA methods, showing superior stability, adaptability, and generalization.