Quantized Residuals to Continuous Prompts for Few-Shot Class Incremental Learning in Vision-Language Models
Abstract
Few-shot Class-Incremental Learning (FSCIL) requires learning new classes from very limited data while preventing catastrophic forgetting. Existing methods rely mainly on visual features and are prone to overfitting, while recent vision–language models (VLMs) offer better transferability but suppress fine-grained information due to contrastive feature decorrelation. Moreover, current FSCIL approaches often use static or fully optimizable prompts, making them either rigid or susceptible to semantic drift in incremental sessions. We introduce QR-Prompt, a residual-driven framework that leverages the visual–textual feature residual of VLMs to recover discriminative fine-grained cues missing from the contrastive space. To ensure stability, we propose Discriminative Subspace Quantization (DSQ), which builds a discrete memory of residual subspaces. To enable plasticity, a Hierarchical Prompt Encoder (HPE) and Prompt Composer (PC) transform these discrete codes into continuous, class-adaptive prompts for novel classes. We derive bounds relating DSQ codebook size to generalization and classification margin, and achieve consistent improvements over state-of-the-art FSCIL methods on CUB200, CIFAR100, and miniImageNet. Our results show that residual-based quantization combined with hierarchical prompt composition yields stable and expressive VLM adaptation for FSCIL.