VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation
Abstract
Though rectified flow models have achieved remarkable performance in image, video, and 3D generation, their practical deployments are challenged by slow inference speeds. Previous acceleration methods rely on caching and reusing, neglecting the growing mismatch between static cached values and evolving input, leading to reduced generated content fidelity.This work proposes Velocity Decomposition and Estimation (VDE), a training-free acceleration method that shifts the paradigm from caching-and-reusing to decomposing-and-estimating.VDE periodically anchors the model’s state with a full forward pass and estimates subsequent outputs analytically. VDE first decomposes the model’s velocity output into components parallel and orthogonal to the input, then exploiting the temporal predictability of the components' coefficients and the consistency of the orthogonal direction for precise, input-adaptive estimation at each timestep.Extensive experiments on image and video generation tasks demonstrate that VDE achieves up to 2.04-3.22× acceleration with minimal loss in visual quality. For example, in image generation, VDE achieves a 2.21× speedup while preserving nearly identical visual quality, outperforming the best baseline by 19.5% in SSIM, 30.3% in PSNR, and reducing LPIPS by 55.4%.