Learned Image Compression via Sparse Attention and Adaptive Frequency
Abstract
Learned image compression (LIC) methods surpass traditional algorithms in rate-distortion (RD) performance, but still struggle to optimally balance effectiveness and efficiency. Moreover, many methods often overlook the importance of frequency-domain information. Even the few recent methods that incorporate fixed frequency transforms lack content-adaptive capabilities. Therefore, we propose an efficient spatial-frequency dual-path LIC method. Specifically, for the spatial path, we introduce Cross-Sparse Window Attention, leveraging sparse, window-conditioned global tokens to efficiently model long-range dependencies. It achieves lower computational cost and superior effectiveness than standard Window-based Multi-head Self-attention. For the frequency path, we design a content-adaptive frequency transform, employing a decomposition weight generator and learnable global weights to adaptively process multi-scale frequency components. Furthermore, we propose Denoising-as-Regularizer, a training-only module that structures and smooths the latent representation via a denoising task, enhancing reconstruction quality at zero inference cost. Experiments on the Kodak, CLIC, and Tecnick datasets demonstrate that the proposed method significantly outperforms existing state-of-the-art methods in both RD performance and latency.