RewardFlow: Generate Images by Optimizing What You Reward
Abstract
RewardFlow is a zero-shot, training-free framework for text-guided image editing and generation based on reward-guided Langevin dynamics. We steer pretrained diffusion and flow-matching models at inference time using a diverse set of differentiable rewards, and control their influence with a prompt-aware adaptive policy that parses the text instruction, infers edit intent, and dynamically adjusts update steps. Our design includes a differentiable VQA-based reward for fine-grained semantic supervision and a SAM-guided reward for precise, localized edits with minimal leakage. Across standard image editing and compositional generation benchmarks, RewardFlow achieves state-of-the-art zero-shot edit fidelity and compositional alignment. We will release the code and an open-source demo upon acceptance of the paper.