PDD: Manifold-Prior Diverse Distillation for Medical Anomaly Detection
Abstract
Medical image anomaly detection faces unique challenges due to subtle, heterogeneous anomalies embedded in complex anatomical structures. Through systematic Grad-CAM analysis, we reveal that discriminative activation maps fail on medical data, unlike their success on industrial datasets, motivating the need for manifold-level modeling. We propose \textbf{PDD} (Manifold-Prior Diverse Distillation), a framework that unifies dual-teacher priors into a shared high-dimensional manifold and distills this knowledge into dual students with complementary behaviors. Specifically, frozen VMamba-Tiny and wide-ResNet50 encoders provide global contextual and local structural priors, respectively. Their features are unified through a \textbf{Manifold Matching and Unification (MMU)} module, while an \textbf{Intra-Backbone Attention (InA)} module enriches intermediate representations. The unified manifold is distilled into two students: one performs layer-wise distillation via \textbf{InA} for local consistency, while the other receives skip-projected representations through a \textbf{Manifold Prior Affine (MPA)} module to capture cross-layer dependencies. A diversity loss prevents representation collapse while maintaining detection sensitivity. Extensive experiments on multiple medical datasets demonstrate that \textbf{PDD} significantly outperforms existing state-of-the-art methods, achieving improvements of up to 11.8\%, 5.1\%, and 2.9\% in AUROC on HeadCT, BrainMRI, and ZhangLab datasets, respectively, and 3.4\% in F1 max on the Uni-Medical dataset, establishing new state-of-the-art performance in medical image anomaly detection.