Rethinking Asymmetric Quantization: Hidden Symmetry in Vision Model Weights
Abstract
Post-training quantization (PTQ) enables rapid deployment of deep pretrained models. In the low-bit regime, recent PTQ methods for vision models adopt asymmetric quantization (AsymQ), introducing zero-point offsets to mitigate quantization errors. However, these offsets impose substantial hardware overhead and fail to fully capture the non-symmetric structure of pretrained weight distributions, leaving many quantization levels unused.In this paper, we reveal a hidden symmetry in the pretrained weights: after removing a few sparse outliers, the distribution becomes nearly symmetric.Accordingly, we propose Dense and Additive Sparse Quantization (DASQ), which decomposes the weights into dense and sparse matrices.The dense component captures the symmetric structure around zero, while the sparse component models the removed outliers, and both can be processed in parallel and can be implemented with efficient zero-point-free computation.Experiments on image classification, object detection, and instance segmentation show that DASQ surpasses state-of-the-art PTQ methods with lower BOPs. On an FPGA, DASQ also demonstrates higher accuracy and lower power consumption than AsymQ at comparable throughput.