Skip to yearly menu bar Skip to main content


Paper
in
Workshop: Mechanistic Interpretability for Vision

Uncovering Branch-specialization in InceptionV1 using k sparse autoencoders

Matthew Bozoukov Matthew Bozoukov


Abstract:

Sparse Autoencoders (SAEs) have shown to find interpretable features in neural networks from polysemantic neurons caused by superposition. Previous work has shown SAEs are an effective tool to extract interpretable features from the early layers of InceptionV1. Since then, there have been many improvements to SAEs but branch specialization is still an enigma in the later layers of InceptionV1. We show various examples of branch specialization occurring in each layer of the mixed4a-4e branch, in the 5x5 branch and in one 1x1 branch. We also provide evidence to claim that branch specialization seems to be consistent across layers, similar features across the model will be localized in the same convolution size branches in their respective layer.

Chat is not available.