RebRL: Reinforcing Discrete Visual Diffusion Models with Rebalanced Timestep Credits
Mu Zhang ⋅ Tianren Ma ⋅ Yunfan Liu ⋅ Kun Hu ⋅ Qixiang Ye
Abstract
Discrete Diffusion Models (DDMs) have shown great potential in image generation, especially when equipped with reinforcement learning (RL) techniques.However, a fundamental yet overlooked limitation is revealed in our experiments: severe imbalance of credit assignment across timesteps during training. As a result, early generation timesteps, which carry higher exploration potential and determine the global structure, provide a smaller contribution to policy optimization.To conquer this, we propose a simple-yet-effective approach, to Re-balance timestep credit of Reinforcement Learning (RebRL) for better exploration-exploitation trade-off and more efficient training of DDMs.RebRL is plug-and-play$\textemdash$ simply replacing uniform temporal policy with strategic rebalancing along masking stages. RebRL is analytically plausible$\textemdash$derivation and analysis show that it enjoys a uniform token-level policy gradient, which benefits policy optimization.Experiments on text-to-image generation benchmarks show that RebRL achieves state-of-the-art performance on GenEval and improves human preference score by up to $\textbf{3.40}$ while effectively reducing training steps by $\textbf{$\sim$40\\%}$.Code is enclosed in the supplementary material.
Successful Page Load