Denoising, Fast and Slow: Difficulty-Aware Adaptive Sampling for Image Generation
Abstract
Diffusion and flow-based models typically allocate compute uniformly across space, updating every patch with the same noise level and number of steps. However, images are highly heterogeneous and not all regions are equally difficult to denoise. We introduce Patch Forcing (PF), a framework that dynamically allocates compute to regions that require more refinement than others. Using an additional head that predicts per-patch difficulty, we can formulate adaptive samplers that dynamically allocate compute where it is most needed. With noise scales that can vary over space and diffusion time, combined with our adaptive solvers, we can advance easier regions earlier to provide context for harder ones. We show that our framework achieves competitive results on class-conditional ImageNet, while remaining orthogonal to guidance methods. We further show that our method also scales to text-to-image synthesis. With Patch Forcing we hope to open a path towards a new family of samplers that allocate compute adaptively, focusing effort on the hardest parts of an image.