D2Cache: Second-Order Delta Caching for Higher Video Diffusion Acceleration
Abstract
Video diffusion models achieve impressive visual fidelity but remain computationally prohibitive for real-time or interactive generation due to their sequential denoising process. Recent caching methods accelerate inference by reusing outputs across timesteps, typically estimating each new output from the first-order residual, which is the difference between adjacent model predictions.To mitigate the accumulated error in caching methods, we propose D2Cache, a training-free method that leverages the smoothness of second-order residual delta, which is temporal differences between consecutive first-order residuals, to predict future timesteps more accurately. We theoretically show that this second-order correction improves prediction accuracy and effectively suppresses cumulative errors. Moreover, D2Cache adaptively scales second-order deltas using error estimates derived from timestep embeddings, maintaining accuracy across varying cache intervals.Empirically, D2Cache outperforms the state-of-the-art TeaCache across four video diffusion models (Latte, Open-Sora, LTX-video, and Wan2.1) at comparable acceleration rates, showing even larger gains under higher acceleration settings.