DynBridge: Bridging Imagination and Control through Interaction Dynamics for Robot Manipulation
Abstract
Recent generative models allow robots to generate future visual outcomes for action guidance, yet most still address imagination and control independently, resulting in visually coherent rollouts but physically inconsistent behaviors. While structural priors enhance spatial grounding, these methods remain visually correlation-driven rather than causally informed, overlooking the bidirectional coupling between robot actions and the evolving environment. We formalize the coupling as interaction dynamics, which specify where environmental changes occur and how actions cause them. Based on this formulation, we introduce DynBridge, an end-to-end framework that unifies imagination and control through the shared dynamics representation. Specifically, DynBridge realizes this via three components: (1) an Interaction Dynamics Generator that forecasts interaction dynamics via joint trajectory generation and action prediction; (2) an Action-Conditioned Dynamics Aggregator that integrates dynamics under control signals; and (3) a Dynamics-Guided Action Predictor that leverages the aggregated dynamics to produce executable, context-aware actions. Results demonstrate that DynBridge consistently outperforms prior methods on simulated and real-world benchmarks without external pretraining.