DMGD: Train-Free Dataset Distillation with Semantic-Distribution Matching in Diffusion Models
Qichao Wang ⋅ Yunhong Lu ⋅ Hengyuan Cao ⋅ Junyi Zhang ⋅ Min Zhang
Abstract
Dataset distillation enables efficient training by distilling the information of large-scale datasets into significantly smaller synthetic datasets. Diffusion based paradigms have emerged in recent years, offering novel perspectives for dataset distillation. However, they typically necessitate additional fine-tuning stages, and effective guidance mechanisms remain underexplored. To address these limitations, we rethink diffusion based dataset distillation and propose a Dual Matching Guided Diffusion (DMGD) framework, centered on efficient training-free guidance. We propose a pioneering theoretical framework for guidance design, proving that optimizing distributional distance under semantic alignment equivalently tightens the upper bound of dataset distillation objectives. Therefore, we first establish **Semantic Matching** via conditional likelihood optimization, eliminating the need for auxiliary classifiers. Furthermore, we propose a dynamic guidance mechanism that enhances the diversity of synthetic data while maintaining semantic alignment. Simultaneously, we introduce an optimal transport (OT) based **Distribution Matching** approach to further align with the target distribution structure. To ensure efficiency, we develop two enhanced strategies for diffusion based framework: Distribution Approximate Matching and Greedy Progressive Matching. These strategies enable effective distribution matching guidance with minimal computational overhead. Experimental results on ImageNet-Woof, ImageNet-Nette, and ImageNet-1K demonstrate that our training-free approach achieves significant improvements, outperforming state-of-the-art (SOTA) methods requiring additional fine-tuning by average accuracy gains of $2.1$%, $5.4$%, and $2.4$%, respectively.
Successful Page Load