Poster Fri, Jun 5, 2026 • 3:00 PM – 5:00 PM PDT ExHall A & F 541

ERMoE: Eigen-Reparameterized Mixture-of-Experts for Stable Routing and Interpretable Specialization

Anzhe Cheng ⋅ Shukai Duan ⋅ Shixuan Li ⋅ Chenzhong Yin ⋅ Mingxi Cheng ⋅ Heng Ping ⋅ Tamoghna Chattopadhyay ⋅ Sophia Thomopoulos ⋅ Shahin Nazarian ⋅ Paul Thompson ⋅ Paul Bogdan

Abstract

Mixture-of-Experts (MoE) architectures expand model capacity by sparsely activating experts, but suffer from two core challenges: misalignment between router logits and each expert’s internal structure leads to unstable routing and expert underutilization, and load imbalances create straggler bottlenecks. Standard solutions, such as auxiliary load-balancing losses, can reduce load disparities but often weaken expert specialization and hurt downstream performance. To address these issues, we propose ERMoE, a sparse MoE transformer that reparameterizes each expert in a learned orthonormal eigenbasis and replaces learned gating logits with an Eigenbasis Score—the cosine similarity between input features and an expert’s basis. This content-aware routing ties token assignments directly to experts’ representation spaces, inherently stabilizing utilization and promoting interpretable specialization without sacrificing sparsity. Crucially, ERMoE eliminates the need for explicit balancing losses and avoids the interfering gradients they introduce. We demonstrate that ERMoE achieves state-of-the-art accuracy on ImageNet classification and cross-modal image-text retrieval benchmarks (e.g., COCO, Flickr30K), while naturally producing flatter expert load distributions. Moreover, a 3D MRI variant (ERMoE-ba) improves brain age prediction accuracy by over 7% and yields anatomically interpretable expert specializations. ERMoE thus introduces a new architectural principle for sparse expert models, directly addressing core routing instabilities and enabling improved performance with scalable, interpretable specialization.