Designing Instance-Level Sampling Schedules via REINFORCE with James-Stein Shrinkage
Abstract
Most post-training methods for text-to-image samplers focus on the model weights: either fine-tuning the backbone for alignment or distilling it for few-step efficiency.We take a different route: rescheduling the sampling timeline of a frozen sampler.Instead of a fixed, global schedule, we learn instance-level (prompt- and noise-conditioned) schedules through a single-pass Dirichlet policy. To ensure accurate gradient estimates in high-dimensional policy learning, we introduce a novel reward baseline based on a principled James–Stein estimator; it provably achieves lower estimation errors than commonly used variants and leads to superior results.Our rescheduled samplers consistently improve text–image alignment including text rendering and compositional control across modern Stable Diffusion and Flux model families.Additionally, a 5-step Flux-Dev sampler with our schedules can attain generation quality comparable to deliberately distilled samplers like Flux-Schnell. We thus position our scheduling framework as an emerging model-agnostic post-training lever that unlocks additional generative potential in pretrained samplers.