Skip to yearly menu bar Skip to main content


Sheared Backpropagation for Fine-tuning Foundation Models

Zhiyuan Yu · Li Shen · Liang Ding · Xinmei Tian · Yixin Chen · Dacheng Tao

Arch 4A-E Poster #104
[ ]
Wed 19 Jun 5 p.m. PDT — 6:30 p.m. PDT


Fine-tuning is the process of extending the training of pre-trained models on specific target tasks, thereby significantly enhancing their performance across various applications.However, fine-tuning often demands large memory consumption, posing a challenge for low-memory devices that some previous memory-efficient fine-tuning methods attempted to mitigate by pruning activations for gradient computation, albeit at the cost of significant computational overhead from the pruning processes during training.To address these challenges, we introduce PreBackRazor, a novel activation pruning scheme offering both computational and memory efficiency through a sparsified backpropagation strategy, which judiciously avoids unnecessary activation pruning and storage and gradient computation.Before activation pruning, our approach samples a probability of selecting a portion of parameters to freeze, utilizing a bandit method for updates to prioritize impactful gradients on convergence. During the feed-forward pass, each model layer adjusts adaptively based on parameter activation status, obviating the need for sparsification and storage of redundant activations for subsequent backpropagation. Benchmarking on fine-tuning foundation models, our approach maintains baseline accuracy across diverse tasks, yielding over 20\% speedup and around 10\% memory reduction. Moreover, integrating with an advanced CUDA kernel achieves up to 60\% speedup without extra memory costs or accuracy loss, significantly enhancing the efficiency of fine-tuning foundation models on memory-constrained devices.

Live content is unavailable. Log in and register to view live content