Semantic Alignment for Pose-Invariant Identity Preserving Diffusion
Abstract
Recent T2I diffusion models have evolved to control multiple conditions, including structure, appearance, and text prompt. Despite this progress, training-based methods demand heavy computation, whereas training-free methods often 're-imagine' the subject to satisfy given structure, thereby compromising identity preservation and attenuating fine textures.We propose SeAl (Semantic Alignment for Pose-Invariant Identity Preserving Diffusion), a novel training-free framework that addresses the 're-imagining' problem from the perspective of 'infusion'. SeAl integrates structure, appearance, and text prompt with three modules: AnchorAlign pre-aligns spatial discrepancies, Reference-guided Appearance Infusion injects identity via semantic matching, and Delta-Bridge leverages the guidance delta to mediate text–appearance conflicts. We demonstrate that our method faithfully reflects all three control factors and dramatically reduces the identity leakage endemic to prior methods. Notably, SeAl excels on challenging datasets where identity preservation typically fails (e.g., distinctive animal features or complex human attire), establishing a novel paradigm for training-free identity preservation in diffusion models.