Heterogeneous Decentralized Diffusion Models
Zhiying Jiang ⋅ Raihan Seraj ⋅ Marcos Villagra ⋅ Bidhan Roy
Abstract
Training state-of-the-art diffusion models requires massive computational resources concentrated in tightly-coupled clusters, fundamentally limiting participation to well-resourced institutions. While Decentralized Diffusion Models (DDM) enable training multiple experts in isolation, existing approaches require 1176 GPU-days and homogeneous training objectives across all experts. We present an efficient framework that dramatically reduces resource requirements while supporting heterogeneous training objectives. Our approach combines three key contributions: (1) PixArt-$\alpha$'s efficient AdaLN-Single architecture, reducing parameters while maintaining quality; (2) pretrained checkpoint conversion from ImageNet-DDPM to Flow Matching objectives, accelerating convergence and enabling initialization without objective-specific pretraining; and (3) a training-free inference conversion framework that unifies heterogeneous expert predictions (DDPM and Flow Matching) into a common velocity space without any retraining. Experiments on LAION-Aesthetics demonstrate that our decentralized approach achieves comparative results with 16$\times$ compute reduction (72 vs 1176 GPU-days) and 14$\times$ data reduction (11M vs 158M images). Our heterogeneous variant mixing DDPM and Flow Matching experts exhibits complementary specialization patterns, improving generation diversity and texture quality despite modest FID increases. By eliminating synchronization requirements and enabling arbitrary objective combinations, our framework democratizes large-scale generative model training, allowing contributors with diverse resources to participate using consumer GPUs requiring only 20-48GB VRAM.
Successful Page Load