RADAR: VQ-VAE Decoder of VAR is a Good Student for Restoring Against Degradation by Acceleration
Abstract
Visual Autoregressive Modeling (VAR) has recently emerged as a powerful paradigm for image generation that surpasses diffusion models in efficiency and quality. However, accelerating attention computation in VAR is still challenging because attention patterns across scales exhibit strong and complex semantic biases that early coarse-scale tokens dominate global structure, while fine-scale tokens mainly refine local details. Existing acceleration methods rely on heuristic token pruning or fixed attention masks, lacking a principled way to balance acceleration and semantic fidelity. In this work, we propose a two-stage acceleration framework for VAR. First, we introduce a semantic-cost-aware masking strategy (SCA-Mask) that quantifies the importance of each attention tile and formulates mask shape design as a cost-constrained optimization problem. This enables adaptive pruning under a given compute budget while preserving essential semantic context. Second, we present Post-Acceleration Adaptation (PAA), a decoder-side fine-tuning scheme that employs internal knowledge distillation to restore image quality from pruned latents. PAA does not require external data and uses a lightweight LoRA-based adaptation, providing a highly efficient alternative to retraining the autoregressive transformer. Comprehensive experiments across multiple VAR tasks demonstrate that our method achieves decent speedup with negligible loss of visual fidelity, yielding a principled and effective pathway toward fast and high-quality visual autoregressive generation.