Protect to Adapt: Subspace-Constrained Adaptation with Ranked Negative Prompt Feedback for Few-Shot Action Recognition
Abstract
Adapting Vision–Language Models (VLMs) to few-shot action recognition (FSAR) often trades accuracy for stability: task-specific gains can trigger catastrophic forgetting of domain-general knowledge and reduce inter-class margins. In few-shot episodes, each query is contrasted with only one positive class and a few negatives, so the text encoder sees limited prompt diversity and rarely observes hard counter-examples near decision boundaries. We propose Protect-to-Adapt (P2A), a parameter-efficient fine-tuning method with two complementary modules. Orthogonal Subspace Control (OSC) estimates a principal semantic subspace of the pre-trained backbone and constrains low-rank updates to its orthogonal complement, preserving domain-general semantics while allowing task-specific adaptation. Ranked Negative-prompt Curriculum (RNC) uses a large language model to generate verifier-filtered negative prompts with increasing difficulty. These class-specific hard counter-examples enlarge margins and sharpen decision boundaries under few-shot conditions. With only 2\% of backbone parameters trainable, P2A achieves state-of-the-art performance on five FSAR benchmarks and substantially reduces catastrophic forgetting in a cross-dataset continual-learning setting where the model is adapted sequentially to multiple video datasets without replay.