Explicit Recovery Behaivor for Diffusion Policies
Abstract
Diffusion policies have emerged as a powerful paradigm for robot learning, but their inherent multi-modality can lead to a diverse set of plausible—though not always optimal—actions from a single observation. We posit that for a given task, an optimal action exists within this distribution. Inspired by negative prompting in generative models, we introduce a novel method that leverages an error detector to identify out-of-distribution (OOD) execution histories and uses them to construct negative action prompts. This allows our policy to steer away from suboptimal behaviors and converge towards higher-performance actions. We present a comprehensive ablation study demonstrating the effectiveness of positive, and negative prompts, and validate our approach on a suite of simulated benchmarks and real-world robotic tasks. Our results show that the proposed Negative-Prompt-guided Diffusion Policy achieves significant improvement in task performance by effectively filtering undesirable action modes.