CVPR Poster Scaling Inference Time Compute for Diffusion Models

Poster

Scaling Inference Time Compute for Diffusion Models

Nanye Ma · Shangyuan Tong · Haolin Jia · Hexiang Hu · Yu-Chuan Su · Mingda Zhang · Xuan Yang · Yandong Li · Tommi Jaakkola · Xuhui Jia · Saining Xie

ExHall D Poster #226

Highlight

[ Abstract ] [ Paper PDF ]

Fri 13 Jun 8:30 a.m. PDT — 10:30 a.m. PDT

Abstract:

Generative models have made significant impacts across various domains, largely due to their ability to scale during training by increasing data, computational resources, and model size, a phenomenon characterized by the scaling laws. Recent research has begun to explore inference-time scaling behavior in large language models (LLMs), revealing how performance can further improve with additional computation during inference. Unlike LLMs, diffusion models inherently possess the flexibility to adjust inference-time computation via the number of function evaluations (NFE), although the performance gains typically flatten after a few dozen steps. In this work, we present a framework on the inference-time scaling for diffusion models, that enables diffusion models to further benefit from the increased computation beyond the NFE plateau. Specifically, we consider a search problem aimed at identifying better noises during the sampling process. We structure the design space along two axes: the verifiers used to provide feedback, and the algorithms used to find better candidates. Through extensive experiments on class-conditioned and text-conditioned image generation benchmarks, our findings reveal that increasing inference-time compute leads to substantial improvements in the quality of samples generated by diffusion models, and with the complicated nature of images, combinations of the components in the framework can be specifically chosen to conform with different application scenario.

Live content is unavailable. Log in and register to view live content