Skip to yearly menu bar Skip to main content


Poster

Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI

Won Jun Kim · Hyungjin Chung · Jaemin Kim · Sangmin Lee · Byeongsu Sim · Jong Chul Ye


Abstract:

Gradient-based methods are a prototypical family of "explainability for AI" (XAI) techniques, especially for image-based models.Nonetheless, they have several shortcomings in that they (1) require white-box access to models, (2) are vulnerable to adversarial attacks, and (3) produce attributions that lie off the image manifold, leading to explanations that are not actually faithful to the model and do not align well with human perception. To overcome these challenges, we introduce "Derivative-Free Diffusion Manifold-Contrained Gradients (FreeMCG)", a novel method that serves as an improved basis for explainability of a given neural network than the traditional gradient. Specifically, by leveraging ensemble Kalman filters and diffusion models, we derive a derivative-free approximation of the model’s gradient projected onto the data manifold, requiring access only to the model’s outputs (i.e., in a completely black-box setting). We demonstrate the effectiveness of FreeMCG by applying it to both counterfactual generation and feature attribution, which have traditionally been treated as distinct tasks. Through comprehensive evaluation on both tasks - counterfactual explanation and feature attribution - we show that our method yields state-of-the-art results while preserving the essential properties expected of XAI tools.

Live content is unavailable. Log in and register to view live content