Tutorial Wed, Jun 3, 2026 • 8:00 AM – 12:00 PM PDT

Principled Interpretability in Vision Models: From Mechanistic Understanding to Interpretable Models by Design

Tsui-Wei (Lily) Weng · Tuomas Oikarinen

Project Page

Abstract

As deep learning systems are increasingly deployed in high-stakes applications, understanding their behavior is critical for ensuring trust and safety. Interpretability provides essential tools to explain, debug, and improve these models. However, the field remains fragmented, spanning a wide range of methods and assumptions, while lacking standardized evaluation protocols. This tutorial aims to provide aunified overview of interpretability in deep learning– bridging post-hoc mechanistic understanding and methods to design inherently interpretable deep learning models.By the end of this tutorial, attendees will gain a solid understanding ofmodern interpretability methodsfor deep learning models, how torigorously evaluatethem, and open research directions in this critical area.