Spectral Mixture-of-Experts for Continual Learning
Abstract
While Parameter-Efficient Fine-Tuning using Mixture-of-Experts (MoE) is a promising solution for continual learning (CL), it suffers from two critical failure modes: structural interference, where expert updates interfere, and compositional forgetting, where the model’s routing policy drifts. To address these issues, we introduce Spectral MoE, a novel framework built for CL from three core components. First, Spectral Experts are parameterized using unique, disjoint spectral masks to confine their learnable parameters to distinct frequency subspaces, ensuring a priori orthogonal updates that prevent structural interference. Second, a Dual-Router mechanism decouples online routing that learns new tasks from an offline memory that archives historical expert importance. Finally, this offline memory enables a Dynamic Consistency Projection, a geometric constraint that suppresses router drift and adaptively shields experts based on their past contributions, mitigating compositional forgetting. Validated on a strict cross-domain CL benchmark, our framework significantly outperforms existing methods, demonstrating superior knowledge retention and plasticity for new tasks. Code will be released upon acceptance.