Skip to yearly menu bar Skip to main content


Poster

FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation

Zhuguanyu Wu · Shihe Wang · Jiayi Zhang · Jiaxin Chen · Yunhong Wang


Abstract:

Network quantization, a prevalent technique for network compression, significantly reduces computational demands and memory usage, thereby facilitating the deployment of large-parameter models onto hardware with constrained resources. Post-training quantization (PTQ) stands out as a cost-effective and promising approach due to its avoidance of the need for retraining. Unfortunately, many current PTQ methods in Vision Transformer (ViT) exhibit a notable decrease in accuracy, especially in lowbit cases. To tackle these challenges, we analyze the extensively utilized Hessian-guided quantization loss, and uncover certain limitations within the approximated pre-activation Hessian. By deducing the relationship between KL divergence and Fisher information matrix (FIM), we develop a more refined approximation for FIM. Building on this, we introduce the Diagonal Plus Low-Rank FIM (DPLR) to achieve a more nuanced quantization loss. Our extensive experiments, conducted across various ViT-based architectures on public benchmark datasets, demonstrate that our quantization loss calculation surpasses the performance of the prevalent mean squared error (MSE) and approximated pre-activation Hessian, and outperform previous work in lowbit cases. Code will be released upon acceptance.

Live content is unavailable. Log in and register to view live content