ARCache: Mitigating Error Accumulation for Caching-based Acceleration in Autoregressive Video Diffusion Models
Abstract
Caching-based acceleration methods have recently driven significant progress in efficient video generation with diffusion models. However, we identify a critical limitation when directly applying these acceleration techniques to autoregressive video diffusion models, which generate long videos by sequentially synthesizing segments conditioned on historical context. In such settings, any approximation errors introduced by acceleration tend to propagate and accumulate over time, resulting in severe error accumulation and progressive degradation of video quality. To address this challenge, we propose ARCache, the first training-free caching-based acceleration framework specifically designed for autoregressive video diffusion models. ARCache improves both the timing and quality of caching through two key components. First, History-Guided Cache (HGC) leverages historical information to adaptively schedule caching for each segment, enabling more accurate and efficient cache utilization. Second, Enhanced Residual Correction (ERC) adaptively approximates model residuals and refines the residual trajectory for subsequent segments, effectively mitigating error accumulation while simultaneously reducing computational overhead. Extensive experiments on Framepack-F1, SkyReels-V2, and autoregressive world model Matrix-Game demonstrate that ARCache achieves state-of-the-art acceleration and visual fidelity.