Align Once to Explain: Feature Alignment for Scalable B-cosification of Foundational Vision Transformers
Abstract
Foundational vision models have become the de facto standard for many vision tasks due to their strong performance. However, they are notoriously opaque and remain hard to interpret. We present ALOE (ALign Once to Explain), a one-time, label-free feature alignment based approach that efficiently converts foundational vision models into inherently interpretable B-cos variants. Once aligned, the B-cos backbone is used as a drop-in replacement across several downstream tasks—amortizing the cost of interpretability. ALOE is robust across pre-training paradigms (supervised, self-supervised, vision–language) and is 100–1000× more data-efficient than training from scratch. On classification, it outperforms fully-supervised B-cos models (e.g., +6.6 p.p. top-1 on ImageNet for ViT-B/16), retains strong linear probing, k-NN, and zero-shot transfer performance competitive with foundational backbones (DINOv3, SigLIP2) across diverse downstream datasets, while yielding well-localized and highly human interpretable explanations by design. Code and models will be released.