Skip to yearly menu bar Skip to main content


Paper
in
Workshop: Mechanistic Interpretability for Vision

Wavelet-Based Mechanistic Interpretability of Vision Transformers via Frequency-Aware Ablations

Sophia Abraham · Jonathan Hauenstein · Walter Scheirer


Abstract:

We explore a wavelet-based interpretability framework for Vision Transformers (ViT), aiming to analyze their reliance on frequency-specific representations. Through systematic ablations of wavelet subbands, we assess how different frequency components contribute to latent representations and attention mechanisms. Our empirical study on CIFAR-10 reveals that high-frequency details, particularly those captured by Haar wavelets, may influence reconstruction fidelity and attention distributions. While preliminary findings suggest a frequency-dependent behavior in ViT representations, further investigation is needed to generalize across datasets and architectures. This study highlights the potential of frequency-based interpretability but also underscores the need for more rigorous evaluation in larger, more diverse settings. To encourage further exploration, all the experimentation and method code can be found on our GitHub repository.

Chat is not available.