CVPR Poster Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression

Poster

Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression

Xiaoyi Qu · David Aponte · Colby Banbury · Daniel Robinson · Tianyu Ding · Kazuhito Koishida · Ilya Zharkov · Tianyi Chen

ExHall D Poster #439

[ Abstract ] [ Project Page ] [ Paper PDF ]

Sat 14 Jun 8:30 a.m. PDT — 10:30 a.m. PDT

Abstract:

Structured pruning and quantization are fundamental techniques used to reduce the size of neural networks, and typically are applied independently. Applying these techniques jointly via co-optimization has the potential to produce smaller, high-quality models. However, existing joint schemas are not widely used because of (1) engineering difficulties (complicated multi-stage processes and hardware inefficiencies), (2) black-box optimization (extensive hyperparameter tuning to control the overall compression), and (3) insufficient architecture generalization. To address these limitations, we present the framework GETA, which automatically and efficiently performs joint structured pruning and quantization-aware training on any deep neural network. GETA introduces three key innovations: (i) aquantization-aware dependency graph analysis that constructs a pruning search space, (ii) a partially projected stochastic gradient method that guarantees a layerwise bit constraint is satisfied, and (iii) a new joint learning strategy that incorporates interpretable relationships between pruning and quantization. We present numerical experiments on both convolutional neural networks and transformer architectures that show that our approach achieves competitive (often superior) performance compared to state-of-the-art joint pruning and quantization methods.

Live content is unavailable. Log in and register to view live content