LogCD: Local-to-global Consistency Distillation for Few-step Image Generation
Abstract
Distilling latent diffusion models (LDMs)/rectified flow models (RFMs) into ones that are fast to sample from conditions is attracting huge interest. However, the majority of existing methods either need significant training resources or lead to quality degradation, especially in text-image alignment. To address these challenges, we propose Local-to-global Consistency Distillation (LogCD) to accelerate LDMs/RFMs via two-stage distillation.LogCD first performs local consistency distillation and then executes global consistency distillation to ensure the consistency along inference path.Besides, Latent Learned Perceptual Image Patch Similarity model is exploited to enhance perceptual consistency.Notably, LogCD exhibits high flexibility, allowing a single unified model to operate with 2 to 4 sampling steps. The model's performance improves seamlessly as the number of steps increases within this range.With only 70 A100 GPU hours, LogCD accelerates SDXL to achieve a 33.5 CLIP score with just 3 sampling steps, surpassing state-of-the-art accelerated models using even more steps. FLUX.1-dev accelerated by LogCD with 4-step sampling presents comparable performance to 25-step teacher model, with CLIP score of 32.6.