Breaking Semantic Boundaries: Distribution-Guided Semantic Exploration for Creative Generation
Abstract
Text-to-image (T2I) diffusion models effectively produce semantically aligned images, but their reliance on training distributions constrains their capacity for synthesizing truly novel, out-of-distribution concepts. Existing methods attempt to enhance creativity through semantic exploration, such as fusing known concept pairs, but the resulting images remain linguistically describable and confined to familiar semantic spaces. Inspired by the soft probabilistic outputs of classifiers on novel or out-of-distribution inputs, we propose Distribution-Conditional Generation, a paradigm that models novel concepts as image synthesis conditioned on class distributions, enabling controllable yet semantically unconstrained creative generation. Building on this, we propose DisTok, an encoder–decoder framework that unifies conditional and unconditional creative generation by decoding latent representations—either randomly sampled or mapped from conditions (e.g., class distributions)—into tokens representing novel concepts. DisTok is trained by iteratively sampling and fusing concept pairs from a dynamic pool to model progressively complex distributions, while enforcing semantic consistency through a vision-language model that aligns the class distributions of generated images with the input distributions. Extensive experiments demonstrate that DisTok enables efficient and flexible semantic exploration for token-level creative synthesis, achieving state-of-the-art text–image alignment and human preference.