Skip to yearly menu bar Skip to main content


Poster

Targeted Forgetting of Image Subgroups in CLIP Models

Zeliang Zhang · Gaowen Liu · Charles Fleming · Ramana Kompella · Chenliang Xu


Abstract:

Foundation models (FMs) such as CLIP have demonstrated impressive zero-shot performance across various tasks by leveraging large-scale, unsupervised pre-training. However, they often inherit harmful or unwanted knowledge from noisy internet-sourced datasets, which compromises their reliability in real-world applications. Existing model unlearning methods either rely on access to pre-trained datasets or focus on coarse-grained unlearning (e.g., entire classes), leaving a critical gap for fine-grained unlearning. In this paper, we address the challenging scenario of selectively forgetting specific portions of knowledge within a class—without access to pre-trained data—while preserving the model’s overall performance. We propose a novel three-stage approach that progressively unlearns targeted knowledge while mitigating over-forgetting. Our method consists of (1) a forgetting stage to fine-tune on samples to be forgotten, (2) a reminding stage to restore the model’s performance on retaining samples, and (3) a restoring stage to recover zero-shot capabilities using model souping. We also introduce knowledge distillation to handle the distribution disparity between forgetting/retaining samples and the unseen pre-trained data. Extensive experiments demonstrate that our approach effectively unlearns specific subgroups while maintaining strong zero-shot performance on other tasks and datasets, outperforming baseline unlearning methods.

Live content is unavailable. Log in and register to view live content