Skip to yearly menu bar Skip to main content


Poster

Explaining Domain Shifts in Language: Concept Erasing for Interpretable Image Classification

Zequn Zeng · Yudi Su · Jianqiao Sun · Tiansheng Wen · Hao Zhang · Zhengjue Wang · Bo Chen · Hongwei Liu · Jiawei Ma


Abstract:

Concept-based models are inherently interpretable, as they map black-box representations to human-understandable concepts, making the decision-making process more transparent. These models allow users to understand the reasoning behind predictions, which is crucial for high-stakes applications. However, they often introduce domain-specific concepts that contribute to the final predictions, which can undermine their generalization capabilities. In this paper, we propose a novel Language-guided Concept-Erasing (lanCE) framework. Specifically, we empirically demonstrate that pre-trained vision-language models (VLMs) can approximate distinct visual domain shifts via a large domain descriptor set. Based on these findings, we introduce a novel plug-in domain descriptor orthogonality (DDO) regularizer to mitigate the impact of these domain-specific concepts on the final predictions. To simulate a wide range of unseen visual domains, we generate a set of domain descriptors by prompting large language models (LLMs). Notably, our proposed DDO regularizer is agnostic to the design of concept-based models and thus can be widely integrated into various such models. By integrating the proposed DDO regularizer into several prevailing models and evaluating them on two standard domain generalization benchmarks and three new benchmarks introduced in this paper, we demonstrate that DDO loss can significantly improve the out-of-distribution (OOD) generalization capabilities over the previous state-of-the-art concept-based models. Codes are available in the supplementary material.

Live content is unavailable. Log in and register to view live content