SineProject: Machine Unlearning for Stable Vision–Language Alignment
Abstract
Multimodal Large Language Models (MLLMs) increasingly need to forget specific knowledge, such as unsafe or private information, without full retraining. However, existing unlearning methods often disrupt vision–language alignment, causing models to reject both harmful and benign queries simultaneously. We trace this failure to the projector network: during unlearning, its Jacobian becomes severely ill-conditioned, leading to unstable optimization and drift in cross-modal embeddings. We introduce SineProject, a simple approach that augments the frozen projector with sinusoidally modulated trainable parameters that improve the Jacobian’s spectral conditioning and stabilize alignment throughout unlearning. Evaluated across standard safety and privacy unlearning benchmarks using LLaVA-v1.5-7B and 13B, SineProject reduces benign-query refusals while achieving complete forgetting of targeted information, delivering state-of-the-art forget–retain trade-offs with negligible computational overhead