VVS: Accelerating Speculative Decoding for Visual Autoregressive Generation via Partial Verification Skipping
Haotian Dong ⋅ Ye Li ⋅ Rongwei Lu ⋅ Chen Tang ⋅ Shu-Tao Xia ⋅ Zhi Wang
Abstract
Visual autoregressive (AR) generation models have demonstrated strong potential for image generation, yet their next-token-prediction paradigm introduces considerable inference latency. Although speculative decoding (SD) has been proven effective for accelerating visual AR models, its "draft one step, then verify one step'' paradigm prevents a direct reduction of the forward passes, thus restricting acceleration potential. Motivated by the visual token interchangeability, we for the first time to explore verification skipping in the SD process of visual AR model generation to explicitly cut the number of target model forward passes, thereby reducing inference latency. Based on an analysis of the drafting stage’s characteristics, we observe that $\textbf{verification redundancy}$ and $\textbf{stale feature reusability}$ are key factors to retain generation quality and speedup for verification-free steps. Inspired of these two observations, we propose a novel SD framework $\textbf{VVS}$ to accelerate $\underline{\text{v}}$isual AR model via partial $\underline{\text{v}}$erification $\underline{\text{s}}$kipping, which integrates three complementary modules: (1) a verification-free token selector with dynamically truncation, (2) token level feature caching and reuse, and (3) fine-grained skipped step scheduling. Consequently, VVS reduces the number of target model forward passes by a factor of ${2.8\times}$ relative to vanilla AR decoding while maintaining competitive generation quality, offering a superior speed–quality trade-off over conventional SD frameworks and revealing strong potential to reshape the SD paradigm. Our code will be publicly available upon acceptance of this paper.
Successful Page Load