PACT: Phase-Like Transition Constraints in Adapter-Based Continual Learning of Vision-Language Models
Xuan Wang ⋅ Guiguang Ding ⋅ Jungong Han
Abstract
Continual Learning (CL) enables Vision-Language Models (VLMs) to acquire new capabilities while retaining prior knowledge, for example, by employing task‑specific adapters. Existing CL approaches typically optimize these adapters to convergence, often with (near-)orthogonality constraints to reduce interference; however, isolating adapters in orthogonal subspaces can suppress cross‑task transfer and sharing. To address this problem, we provide a new perspective based on PAC-Bayesian analysis: once the per‑task optimization has converged, adapters should be further shaped to satisfy \underline{P}hase‑like tr\underline{A}nsition \underline{C}ons\underline{T}raints (PACT) -- a two-part formulation that (i) specifies a phase‑like transition relation among adapters and (ii) imposes explicit constraints that enforce this relation. Under PACT, adapter dynamics resemble the phase transition of water: the system gravitates toward either a “frozen” (history‑preserving, tightly constrained) or a “melted” (task‑adaptive, free) regime, while moving between them smoothly rather than via hard thresholds. We operationalize PACT by coupling stability and plasticity regularizers within a two‑branch Vision Transformer (ViT), seeding adapters with a Stable Adapter Initialization (SAI), and introducing a Prior Anchoring (PA) mechanism, thereby inducing phase‑like adapter dynamics. Across diverse CL settings, PACT surpasses state‑of‑the‑art methods while reducing the number of trainable parameters by $36.96\%$ relative to standard adapter‑based baselines. Our code will be released publicly.
Successful Page Load