When Local Rules Create Global Order: Self-Organized Representation Learning for Latent Diffusion Models
Abstract
This work studies how latent space structure impacts the performance of Latent Diffusion Models (LDMs). We show that effective generation requires a latent space that is simultaneously locally smooth, enabling stable and reliable reconstruction, and globally dispersive, allowing the model to draw diverse and meaningful samples without collapsing into narrow regions of the latent space. However, existing approaches often emphasize smoothness, which may lead to concentrated latent regions and limited exploration of the broader space. To address these limitations, we propose Self-Organized Representation Learning (SORL), a bottom-up training paradigm inspired by self-organization in complex systems, where global structure emerges naturally from simple local interactions. The critical latent properties of smoothness and maximal dispersity are not explicitly imposed. Instead, SORL promotes these properties through two complementary local mechanisms: local attraction, which encourages coherent reconstructions among nearby latent codes, and local repulsion, which prevents latent codes from collapsing into dense clusters. Through their interaction, SORL induces a latent manifold that maintains both local smoothness and global dispersity, leading to improved reconstruction and generation.