Revisiting 2D Foundation Models for Scalable 3D Medical Image Classification
Abstract
3D medical image classification is essential to modern clinical workflows. Medical foundation models (FMs) have emerged as a promising approach for scaling to new tasks, yet current research suffers from three critical pitfalls: data-regime bias, suboptimal adaptation, and insufficient task coverage. In this paper, we address these pitfalls and introduce AnyMC3D, a scalable 3D classifier adapted from 2D FMs. It allows efficient scaling to new tasks by adding only lightweight plugins (~1M parameters per task) to a single frozen backbone. Besides, this versatile framework also supports multi-view inputs, auxiliary pixel-level supervision, and interpretable heatmap generation. We establish a comprehensive benchmark of 12 tasks covering diverse pathologies, anatomies, and modalities and systematically evaluate state-of-the-art 3D classification techniques. Our analysis reveals several key insights: (1) effective adaptation is critical to unlock FM potential, (2) general-purpose FMs can match medical-specific FMs if properly adapted, and (3) 2D-based methods surpass 3D architectures for 3D classification. For the first time, we demonstrate the feasibility of achieving state-of-the-art performance across diverse applications using a single scalable framework (e.g., 1st place in the *** challenge), eliminating the need for separate task-specific 3D models.