OneSparse: A Unified Framework for Sparse Activation Layers in Vision Models
Abstract
Sparse activation layers, primarily Mixture-of-Experts (MoE) and memory-based modules, are a central approach for scaling large models and are gaining traction in vision tasks. Despite conceptual similarities, these paradigms have evolved independently, hindering systematic comparison and the development of modules that exploit their complementary strengths. To bridge this gap, we propose OneSparse, a unified framework that reformulates MoE and memory modules under a common abstraction. This enables their systematic comparison and integration, revealing a continuous design space. Guided by this abstraction, we design the Nexus Layer, which features two key innovations: a unified routing mechanism that merges the efficiency of memory retrieval with MoE's load balancing to ensure stable and scalable token assignment, and an adaptive processing strategy where memory modules sketches coarse representations while expert modules refine critical regions. Extensive experiments on image classification, object detection, and semantic segmentation demonstrate that our Nexus Layer establishes a new performance efficiency frontier, surpassing representative sparse baselines on convolutional and transformer architectures. These results validate the power of the OneSparse framework to unify and integrate complementary sparse paradigms and underscores the potential of hybrid sparse modeling in vision.