Skip to yearly menu bar Skip to main content


Poster

APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers

Zhuguanyu Wu · Jiayi Zhang · Jiaxin Chen · Jinyang Guo · Di Huang · Yunhong Wang


Abstract:

Vision Transformers (ViTs) have become one of the most commonly used visual backbones. While ViTs generally achieve higher accuracy than equivalently sized CNNs, they often experience significant accuracy loss during quantized deployment, especially with ultra-low bit post-training quantization (PTQ) schemes. Current reconstruction-based PTQ methods commonly used for CNNs experience significant accuracy drops on ViTs, due to the inaccurate estimation of output importance and the large accuracy drop in quantizing post-GELU activation. To address these issues, we propose APHQ-ViT, a new PTQ paradigm based on average perturbation Hessian (APH) importance estimation: to assess output importance, we thoroughly analyzed the approximations used in previous works with Hessian loss and the reasons for their low accuracy, and proposed a more precise average perturbation Hessian loss. To address the issue of quantizing post-GELU activation, we proposed an MLP-Reconstruction (MR) technique that reconstructs the MLP using the average perturbation Hessian importance estimation. As well as reducing the activation range, MLP-Reconstruction replaces the activation function of the MLP from GELU to ReLU with the small unlabeled calibration set, further enhancing the model's accuracy. With only linear quantizers, APHQ-ViT outperforms existing methods with a substantial margin on 3-bit and 4-bit PTQ of ViTs in different vision tasks.

Live content is unavailable. Log in and register to view live content