FAST: Topology-Aware Frequency-Domain Distribution Matching for Coreset Selection
Jin Cui ⋅ Boran Zhao ⋅ Jiajun Xu ⋅ Jiaqi guo ⋅ Shuo Guan ⋅ Pengju Ren
Abstract
Coreset selection compresses large datasets into compact, representative subsets, reducing the energy and computational burden of training deep neural networks. Existing methods are either: (i) DNN-based, which are inherently coupled with network-specific parameters, inevitably introducing architectural bias and compromising generalization; or (ii) DNN-free, which utilize heuristics that lack rigorous theoretical guarantees for stability and accuracy. Neither approach explicitly constrains distributional equivalence of the representative subsets, largely because continuous distribution matching is broadly considered inapplicable to discrete dataset sampling. Furthermore, prevalent distribution metrics (e.g., MSE, KL, MMD, and CE) are often incapable of accurately capturing higher-order moment differences. These deficiencies lead to suboptimal coreset performance, preventing the selected coreset from being truly equivalent to the original dataset.In this work, we propose FAST (Frequency-domain Aligned Sampling via Topology), the first DNN-free distribution-matching coreset selection framework that formulates coreset selection as a graph-constrained optimization problem grounded in spectral graph theory and employs the Characteristic Function Distance (CFD) to capture full distributional information (i.e., all moments and intrinsic correlations) in the frequency domain. We further discover that naive CFD suffers from a “vanishing phase gradient” issue in medium and high-frequency regions; to address this, we introduce an Attenuated Phase-Decoupled CFD. Furthermore, for better convergence, we design a Progressive Discrepancy-Aware Sampling strategy that progressively schedules frequency selection from low to high. This preserves global structures before refining local details, enabling accurate matching with few frequencies while preventing overfitting. Extensive experiments demonstrate that FAST significantly outperforms state-of-the-art coreset selection methods across all evaluated benchmarks, achieving an average accuracy gain of 9.12%. Compared to other baseline coreset methods, it reduces power consumption by 96.57% and achieves a 2.2$\times$ average speedup even on CPU with 1.7GB of memory, underscoring its high performance and energy efficiency.
Successful Page Load