LIFT and PLACE: A Simple, Stable, and Effective Knowledge Distillation Framework for Lightweight Diffusion Models
Abstract
We demonstrate that in knowledge distillation for diffusion models, the teacher network’s highly complex denoising process—stemming from its substantially larger capacity—poses a significant challenge for the student model to faithfully mimic. To address this problem, we propose a coarse-to-fine distillation framework with LInear FiTting-based distillation (LIFT) and Piecewise Local Adaptive Coefficient Estimation (PLACE). First, LIFT decomposes the objective into a coarse'' alignment and afine'' refinement. The student is then trained on coarse alignment before proceeding to hard refinement. Second, LInear FiTting-based distillation extends LIFT to address spatially non-uniform errors by partitioning outputs into error-based groups, providing locally adaptive guidance.Our comprehensive experimental results demonstrate that ours, \core~with \pick, outperforms previous knowledge distillation on diffusion models based on both U-Net and DiT architectures. Furthermore, as compression rates become exceedingly high, conventional knowledge distillation fails to provide sufficient guidance, thereby preventing lightweight diffusion models from achieving stable training. In contrast, our method demonstrates stable convergence even under such extreme compression ratios.