pH-Strips for Selective Forgetting: A Blunt but Fast Diagnostic Baseline for Machine Unlearning
Abstract
Machine Unlearning (MU), erasing undesirable content from Artificial Intelligence (AI) models, plays an essential role in developing safe and trustworthy AI systems.Despite notable advances, the baseline MU methods rely on retraining from scratch without the data to be removed, which is computationally expensive and financially prohibitive.To address this challenge, we propose a simple yet efficient \textbf{training-free} and \textbf{retain-set-free} MU algorithm designed explicitly as a \textbf{diagnostic baseline}: Machine Unlearning pH-Test (MUpHT).It is designed to serve as a practical evaluation reference for future MU methods.Our method eliminates the low dimensional subspaces associated with undesirable concepts from the space spanned by the model's weight vectors, thereby rendering the model ``blind" to these undesirable contents. Additionally, we extend our retain-aware variant to handle entangled features by leveraging a generalized Rayleigh quotient over the undesirable and retain sets, enabling an efficient tradeoff between preserving retained knowledge and suppressing undesirable knowledge.Our method enables evaluation of MU across diverse visual tasks, including concept erasure for classification, image generation, and multimodal applications.By producing an unlearned model instantly from only a few samples, our method serves as a quick litmus test for MU.