Poster
The Illusion of Unlearning: The Unstable Nature of Machine Unlearning in Text-to-Image Diffusion Models
Naveen George · Karthik Nandan Dasaraju · Rutheesh Reddy Chittepu · Konda Reddy Mopuri
Text-to-image models such as Stable Diffusion, DALL·E, and Midjourney have gained immense popularity lately. However, they are trained on vast amounts of data that may include private, explicit, or copyrighted material used without permission, raising serious legal and ethical concerns. In light of the recent regulations aimed at protecting individual data privacy, there has been a surge in Machine Unlearning methods designed to remove specific concepts from these models. However, we identify a critical flaw in these unlearning techniques: unlearned concepts will revive when the models are fine-tuned, even with general or unrelated prompts. In this paper, for the first time, through an extensive study, we demonstrate the unstable nature of existing unlearning methods in text-to-image diffusion models. We introduce a framework that includes a couple of measures for analyzing the stability of existing unlearning methods. Further, the paper offers preliminary insights into the plausible explanation for the instability of the mapping-based unlearning methods that can guide future research toward more robust unlearning techniques. Anonymized codes for implementing the proposed framework are provided.
Live content is unavailable. Log in and register to view live content