Semantic Derivative Flow: Graph-Guided Diffusion for Controllable Instance Interactions
Abstract
Despite remarkable progress in text-to-image diffusion models, controlling the semantic and spatial relationships between interacting instances remains a fundamental challenge. Current methods that inject spatial constraints often fail to model the intrinsic functional dependencies between entities, leading to implausible interactions. In this paper, we introduce Semantic Derivative Flow (SDF), a novel graph-guided framework that structures the diffusion process within a directed acyclic interaction graph. Our core innovation is a theoretically-motivated derivative attention mechanism, which explicitly enforces the semantic representation of a predicate to be derived from its subject, and the object from the predicate, formalizing a differentiable semantic graph. This principled approach compels the generative process to adhere to the logical chain of interaction. We further integrate a global context node and a real-time regional refinement module to ground the graph in the visual domain holistically. Extensive experiments demonstrate that our model, an instantiation of SDF, establishes a new state-of-the-art in fidelity and controllability on the HICODet benchmark. We complement our empirical results with a theoretical analysis, framing our method as structured message passing on interaction graphs, which provides a rigorous justification for its efficacy and generalization benefits.