Skip to yearly menu bar Skip to main content


Poster

Interpretable Generative Models through Post-hoc Concept Bottlenecks

Akshay R. Kulkarni · Ge Yan · Chung-En Sun · Tuomas Oikarinen · Tsui-Wei Weng


Abstract: Concept bottleneck models (CBM) aim to produce inherently interpretable models that rely on human-understandable concepts for their predictions. However, the existing approach to design interpretable generative models based on CBMs is not efficient and scalable, as it requires expensive generative model training from scratch as well as real images with labor-intensive concept supervision. To address these challenges, we present two novel and low-cost methods to build interpretable generative models through post-hoc interpretability and we name them concept-bottleneck autoencoder (CB-AE) and concept controller (CC). Our approach enables efficient and scalable training by using generated images and our method can work with minimal to no concept supervision. Our proposed methods generalize across modern generative model families including generative adversarial networks and diffusion models. We demonstrate the superior interpretability and steerability of our methods on numerous standard datasets like CelebA, CelebA-HQ, and CUB with large improvements (average 25\%) over the prior work, while being 4-15× faster to train. Finally, we also perform a large-scale user study to validate the interpretability and steerability of our methods.

Live content is unavailable. Log in and register to view live content